• Nenhum resultado encontrado

A model for clustering data from heterogeneous dissimilarities

N/A
N/A
Protected

Academic year: 2021

Share "A model for clustering data from heterogeneous dissimilarities"

Copied!
14
0
0

Texto

(1)

Contents lists available at ScienceDirect

European

Journal

of

Operational

Research

journal homepage: www.elsevier.com/locate/ejor

Stochastics

and

Statistics

A

model

for

clustering

data

from

heterogeneous

dissimilarities

Éverton

Santi

a

, Daniel

Aloise

b , ∗

, Simon J.

Blanchard

c

a School of Sciences and Technology, Universidade Federal do Rio Grande do Norte, Natal-RN 59072-970, Brazil

b Department of Computer Engineering and Automation, Universidade Federal do Rio Grande do Norte, Natal-RN 59072-970, Brazil c McDonough School of Business, Georgetown University, Washington, DC 20057, USA

a

r

t

i

c

l

e

i

n

f

o

Article history:

Received 16 February 2015 Accepted 18 March 2016 Available online 26 March 2016 Keywords: Data mining Clustering Heterogeneity Optimization Heuristics

a

b

s

t

r

a

c

t

Clustering algorithmspartitionaset ofnobjects intopgroups(called clusters),suchthat objects as-signedtothesamegroupsarehomogeneousaccordingtosomecriteria.Toderivetheseclusters,thedata inputrequiredisoftenasinglen× ndissimilaritymatrix.Yetformanyapplications,morethanone in-stanceofthedissimilaritymatrixisavailableandsotoconformtomodelrequirements,itiscommon practicetoaggregate(e.g.,sumup,average)thematrices.Thisaggregationpracticeresultsinclustering solutionsthatmaskthetruenatureoftheoriginaldata.Inthispaperweintroduceaclusteringmodel which,tohandletheheterogeneity,usesallavailabledissimilaritymatricesandidentifiesforgroupsof individualsclusteringobjectsinasimilarway.Themodelisanonconvexproblemanddifficulttosolve exactly,andwethusintroduceaVariableNeighborhoodSearchheuristictoprovidesolutionsefficiently. Computationalexperimentsandanempiricalapplicationtoperceptionofchocolatecandyshowthatthe heuristicalgorithmisefficientandthattheproposedmodelissuitedforrecoveringheterogeneousdata. Implicationsforclusteringresearchersarediscussed.

© 2016 Elsevier B.V. All rights reserved.

1. Introduction

Clustering algorithms determine groups of objects in such a way that objects in the same group, called clusters, are more similar to one another than to those in other groups ( Hansen & Jaumard, 1997 ). Clustering is ubiquitous, with applications in the natural sciences, psychology, medicine, engineering, economics, marketing and other fields (e.g. Frey & Dueck, 2007; Jain, Murty, & Flynn, 1999; McLachlan & David, 2004 ).

Among the many types of clustering models, a popular one is

partitioning a set O=

{

o1,...,on

}

of n objects into a set of P=

{

C1,...,Cp

}

clusters such that:

(i) Cj =,

j

{

1 ,...,p

}

;

(ii) CiCj = ∅ ,

i,j

{

1 ,...,p

}

with i  = j; and

(iii) ∪p

j=1Cj =O.

The input data for clustering algorithms is often a single ma- trix X of dimensions n × s, obtained by measuring s features of the objects of O. This matrix is then used to compute a n× n ma- trix of pairwise dissimilarities D=

(

di j

)

between objects of O, such

that dijfor i,j

{

1 ,...,n

}

(usually) satisfy: (i) di j =dji ≥ 0, and (ii)

Corresponding author.

E-mail addresses: santi.everton@gmail.com (É. Santi), aloise@dca.ufrn.br , daniel.aloise@gerad.ca (D. Aloise),sjb247@georgetown.edu (S.J. Blanchard).

dii =0 . Such single D dissimilarity matrix does not need to satisfy

triangle inequalities, i.e., to be distances.

For many problems, only one dissimilarity matrix is available. For instance the Iris dataset ( Fisher, 1936 ), one of the most popular datasets used in cluster analysis, consists of 150 samples from each of three species of Iris flowers where each flower is measured on four characteristics. Using the attributes measured, the flowers are typically cluster in the three (expected) species. The use of clas- sical clustering algorithms (e.g. k-means, single-linkage, complete- linkage) on this dataset can provide excellent results. It is however possible that more than one dissimilarity matrix is available. In the context of the Iris data set, one could envision asking a sample of multiple experts to measure the same flowers, in case there has been significant measurement error. If there is heterogeneity in the data reported, we argue that aggregating the dissimilarity matrices might mask differences truly present in the data.

There are indeed many contexts for which multiple measure- ments (i.e., dissimilarity matrices) are available. For instance in the social sciences, it is common to ask a sample of individuals to each provide pairwise similarity judgements between brands (e.g., how similar is Coke to Pepsi?). Such tasks, known as pairwise similarity tasks, produce one dissimilarity matrix for each partici- pant, and have been used to study preference formation ( Carpenter & Nakamoto, 1994 ), advertisement similarity ( Schweidel, Bradlow, & Williams, 2006 ), comparing brands ( Bijmolt, Wedel, Pieters, & DeSarbo, 1998 ), store positioning ( Arora, 1982 ), variety seeking http://dx.doi.org/10.1016/j.ejor.2016.03.033

(2)

( Feinberg, Kahn, & McAlister, 1992 ), and substitution decisions ( Hamilton et al., 2014; Ratneshwar & Shocker, 1991 ). The neces- sity to consider the multiple dissimilarity matrices stems from the fact that measurements often reflect differences in perception. Such different dissimilarity matrices can be thought of as reflect- ing different points of view ( Brusco & Cradit, 2005; DeSarbo & Car- roll, 1985; DeSarbo, Atalay, LeBaron, & Blanchard, 2008; Lee, 2001; Steinley, Hendrickson, & Brusco, 2015; Vichi, Rocci, & Kiers, 2007 ) which have been incorporated in a large number of perceptual models that include multidimensional scaling, three-way cluster- ing, and mixture models.

Among the various clustering models available, the p-median model has received significant attention across fields (e.g., Sáez- Aguado and Trandafir, 2012 ; Brusco, Steinley, Cradit, and Singh, 2012 ; Avella, Boccia, Salerno, and Vasilyev, 2012) . The p-median model aims to partition objects into clusters such that the sum of the distances from each object to the central exemplar of its clus- ter (i.e., median) is minimal. Given n objects to be clustered and a known number of clusters p, the mathematical problem can be formulated as an integer linear program ( ReVelle & Swain, 1970 ). In our notation, there is one key set of decision variables that in- volves the assignment of objects to clusters. First, ej j =1 , which

indicates if object j is chosen as the median of a cluster, and 0 otherwise, for j

{

1 ,...,n

}

. Second, on the off-diagonal elements,

ei j = 1 if object i is assigned to the cluster whose object j is the

median, and 0 otherwise, for i

{

1 ,...,n

}

(object j is naturally as- signed to itself if it is a median). Using this notation, the p-median model is expressed as follows:

min n  i=1 n  j=1 di jei j (1)) subject to n  j=1 ei j= 1 ,

i

{

1 ,...,n

}

(2) n  j=1 ej j= p (3) ei j≤ e j j

i,j

{

1 ,...,n

}

(4) ei j

{

0 ,1

}

i,j

{

1 ,...,n

}

. (5)

The constraints (2) require that each object must be assigned to one and only one median. Constraint (3) imposes that the number of medians must be exactly p. The constraints (4) ensure that ob- ject i can only be assigned to object j if object j is a median and constraints (5) are domain constraints for the decision variables. Finally, the product of dij and eij in (1) captures the dissimilarity

from each object i to its closest median j.

One of the main characteristics of the p-median is its breadth of applicability. It can be applied to cluster metric data as well as to more general similarity/dissimilarity data, even asymmetric or rectangular data structures (i.e., when not every object can be a median) ( Köhn, Steinley, & Brusco, 2010 ). Mladenovi ´c, Brimberg, Hansen, and Moreno-Prez (2007) present an extensive review of exact and heuristic solution methods for this problem. Despite its advantages, including excellent classification rates, robustness to outliers and attractive assumptions, an aggregate p-median formu- lation may still mask individual heterogeneity as is later shown in the empirical application.

In this paper, we propose a mathematical programming for- mulation based on the p-median to cluster data collected from

individuals 1 who provided heterogeneous dissimilarity matrices. The model is conceived in two levels. The first identifies clusters of individuals, herein called groups for readability, with similar clus- tering structures. The second identifies the partitions of objects, for each of these groups.

The remainder of the paper is as follows. In the next section, we present the mathematical formulation and discuss how avail- able exact algorithms can be used to solve our model. In Section 3 , we describe the Variable Neighborhood Search (VNS) ( Hansen & Mladenovi ´c, 2001; Mladenovi ´c & Hansen, 1997 ) heuristic for the model. In Section 4 , we present a Monte Carlo Simulation whereby our results illustrate the necessity for heuristic algorithms. We also show that the proposed VNS heuristic has the ability to pre- dict heterogeneous clustering data. Section 5 provides an empiri- cal example from a local United States retailer about perceptions of chocolate candies. This last section illustrates how the proposed methodology can be used by managers, and help discover insights based on heterogeneous perceptions.

2. Problem formulation

Let m individuals evaluate n objects such that a matrix data

Dk=

(

dk

i j

)

is obtained for k

{

1 ,...,m

}

, representing the dissim-

ilarities between pairs of objects i and j as perceived by individual

k, and ck, for k

{

1 ,...,m

}

, the number of clusters expected by in-

dividual k. The clustering problem considered in this work involves identifying groups of individuals whose dissimilarity matrices sug- gest a similar clustering solution. Clusters, for each group of indi- viduals, are organized by means of a medians-based model where each clustered object is associated to the most representative item (i.e., the median) of its cluster. The HeterogeneousClustering Prob-lem (HCP) can be formulated as follows:

min m  k=1 G  g=1 zkg



n  i=1 n  j=1 dk i je g i j



(6) subject to n  j=1 egi j= 1

g

{

1 ,...,G

}

,

i

{

1 ,...,n

}

(7) egi j≤ e g j j

g

{

1 ,...,G

}

,

i,j

{

1 ,...,n

}

(8) G  g=1 zkg= 1

k

{

1 ,...,m

}

(9) m  k=1 zkg≥ 1

g

{

1 ,...,G

}

(10) n  j=1 egj j=



 m k=1ckzkg  m k=1zkg



g

{

1 ,...,G

}

(11) egi j

{

0 ,1

}

g

{

1 ,...,G

}

,

i,j

{

1 ,...,n

}

(12) zkg

{

0 ,1

}

g

{

1 ,...,G

}

,

k

{

1 ,...,m

}

(13) The m individuals are partitioned into G groups. The deci- sion variables zkg express the assignment of individual k to group

g. Variables egi j are equal to 1 if object i is assigned to object

1 We use the term individuals for clarity with respect to our empirical applica-

tion which refers to different consumers. We note such dissimilarity matrices could come from other sources (e.g., firms, repeated measurement for the same person, etc.), in line with the research on ”points of view” ( Brusco & Cradit, 2005 ).

(3)

j in group g, and ei jg= 0 otherwise. The objective is to mini- mize (6) , i.e., the sum of dissimilarities between each object and its assigned median, conditional on (individual) group member- ship. Constraints (7) require that each object i be assigned to ex- actly one median, as part of each group g’s clustering solution. Constraints (8) ensure that object i can only be assigned to ob- ject j for group g if object j is a median for that group. Con- straints (9) require that each individual is assigned to exactly one group, whereas constraints (10) guarantee that no empty group ex- ist. Finally, constraints (11) impose the total number of medians for each group g equal to the floor of the average number of medians expected by the individuals in that group. As argued and shown by Blanchard, Aloise, and DeSarbo (2012a) ; Blanchard, DeSarbo, Atalay, and Harmancioglu (2012b) , this suggestion of limiting the number of clusters to represent consumer perceptions follows numerous researchers in the behavioral literature who have shown that in- dividuals tend to favor simple representations when forming ob- ject perceptions and preferences ( Bettman, Luce, & Payne, 1998; Bettman & Park, 1980; Shugan, 1980; Simon, 1955 ). In our em- pirical application, the data reflects the fact that each individual had formed his own partitions. Doing so provided us actual data to justify the use of different number of medians for each group of individuals.

The model in ( 6 )–( 13 ) may be further simplified. For instance, the optimization process guarantee that nj=1egj j is an integer value as big as possible given that more medians in a group im- ply lower (or equal) objective function values. Consequently, con- straints (11) can be replaced by the following inequalities:

n  j=1 egj j≤  m k=1ckzkg  m k=1zkg ,

g

{

1 ,...,G

}

, (14)

without affecting the optimal solution. Moreover, these constraints can be modified if the user prefers that the number of medians in each group equals the closest integer to

m k=1ckzkg

m

k=1zkg instead of the floor. For that, it would suffice to add 0.5 to the right-hand side of constraints (14) .

The HCP is a Mixed-Integer Quadratically Constrained Quadratic Problem (MIQCQP) for which literature concerning exact and heuristic methods is vast (e.g. Anstreicher, 2012; Audet, Hansen, Jaumard, & Savard, 20 0 0; Billionnet, Elloumi, & Lambert, 2016; Bomze & Locatelli, 2004; Galli & Letchford, 2014; Saxena, Bonami, & Lee, 2010; Zheng, Sun, & Li, 2011 ). Particularly for the HCP, all variables are required to be binary such that the problem is a 0–1 QCQP. In the following subsections, we present the methods em- ployed here to solve the HCP exactly.

2.1. Genericsolvers

In our attempts to solve the HCP exactly, we used Couenne ( Belotti, Lee, Liberti, Margot, & Wächter, 2009 ), Baron ( Tawarmalani & Sahinidis, 2005 ), and GloMIQO ( Misener & Floudas, 2013 ). The first two are different implementations of the spatial Branch-and- Bound (sBB) algorithm ( Liberti, 2006 ) for nonconvex mixed-integer nonlinear problems (MINLP). Much like a Branch-and-Bound (BB) algorithm for MIPs, sBB explores the feasible space exhaustively but implicitly, finding a guaranteed

ε

-approximate solutions for any given

ε

> 0 in finite (potentially exponential) time. Unlike MIPs, whose continuous relaxation is a linear program, and un- like convex MINLPs, whose continuous relaxation is a convex NLP, the continuous relaxation of a nonconvex MINLP is usually diffi- cult to solve. To address this issue, sBB algorithms form and solve convex relaxations of the given MINLP. The convexity gap between the original MINLP and its convex relaxation therefore stems from two factors: the relaxation of the integrality constraints, as well

as the relaxation of the nonconvex terms appearing in the MINLP. The third generic solver GloMIQO is a branch-and-cut algorithm based on generating tight convex relaxations from detecting spe- cial structures such as convexity and edge-concavity. The algorithm is specialized to address MIQCQPs to

ε

-optimality.

2.2.Fortet’slinearizationwithRLT-constraints

Linearization can be achieved by means of Fortet’s inequal- ities ( Fortet, 1960 ), thereby replacing the product of binary variables egi j× zkg by wkg

i j ( w kg

i j ∈[0 ,1] ) for k

{

1 ,...,m

}

; i,j

{

1 ,...,n

}

; g

{

1 ,...,G

}

, along with three additional constraints which together ensure that max

{

0 ,egi j+zkg− 1

}

≤ wkg

i j. The three

sets of constraints are: wkgi j ≤ e g i j

g

{

1 ,...,G

}

,

k

{

1 ,...,m

}

,

i,j

{

1 ,...,n

}

, (15) wkgi j ≤ z kg

g

{

1 ,...,G

}

,

k

{

1 ,...,m

}

,

i,j

{

1 ,...,n

}

, (16) wkgi j ≥ e g i j+ z kg− 1

g

{

1 ,...,G

}

,

k

{

1 ,...,m

}

,

i,j

{

1 ,...,n

}

. (17) To further accelerate the optimization process of the resulting mixed-integer problem (MIP), we strengthen the formulation by adding constraints (cuts) that do not affect the optimal integer so- lution. In the spirit of the Reformulation- Technique (RLT) ( Sherali & Adams, 1990 ), we obtain a set of additional cuts by multiplying the n× G constraints in (7) by zkg and

(

1 − zkg

)

, for k=1 ,...,m,

then replacing the products egi j× z kgby wkg

i j. This yields the follow-

ing constraints: n  j=1 wkgi j = zkg,

g

{

1 ,...,G

}

,

k

{

1 ,...,m

}

,

i

{

1 ,...,n

}

, (18) n  j=1

(

egi j− w kg i j

)

= 1 − z kg

g

{

1 ,...,G

}

,

k

{

1 ,...,m

}

,

i

{

1 ,...,n

}

. (19) Notice that constraints (18) make constraints (16) redundant. Without loss of generality, consider particular indices i, k and g as- sociated to one of the constraints (18) . In order to hold the equal- ity, since w is non-negative, each one of the terms wkgi j must be smaller than zkg, which implies that constraints (16) are redundant.

Moreover, constraints (17) are redundant due to constraints (19) . For all g

{

1 ,...,G

}

,

k

{

1 ,...,m

}

,

i,j

{

1 ,...,n

}

, we can rewrite (17) as egi j− w kg

i j ≤ 1 − z

kg. Now, let us take, without loss

of generality, particular indices i, k and g associated to one of the constraints (19) . It follows that the equality holds if and only if

egi j− wkgi j ≤ 1− zkg, since constraints (15) guarantee that eg i j− w

kg i j

0 . Consequently, constraints (17) are also redundant to the model. The resulting MIP model is denoted

HCP-R1

, and was solved in our computational experiments by the generic MIP solver CPLEX version 12.6.

2.3.Convexifications

As a 0–1 QCQP, the HCP can also be written in the following form:

min xTQ0x+ c0x (20)

subject to

(4)

x

{

0 ,1

}

s (22) where the Qh are symmetric matrices of order s, the ch are s-

vectors and bh are scalars. To illustrate, consider an illustrative

objective function that is as follows 3 z11e1

12+3 z11e121+3 z12e112+

3 z12e1

21+ 4 z21e212+ 4 z21e221+ 4 z22e212+ 4 z22e212 with m = 2 , n = 2

and G=2 . We can express this function in quadratic form with:

x=

e1 12 e1 21 e2 12 e2 21 z11 z12 z21 z22

and Q0=

0 0 0 0 1 .5 1 .5 0 0 0 0 0 0 1 .5 1 .5 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 2 2 1 .5 1 .5 0 0 0 0 0 0 1 .5 1 .5 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 2 2 0 0 0 0

We performed the convexification of (6) and (14) with two dif- ferent methods:

(i) the method proposed by Hammer and Rubin (1970) , which makes matrix Q semi-definite positive by subtracting its minimum eigenvalue from the diagonal entries and by ad- justing the linear term of the expression, and

(ii) the method that convexifies each product C· x· y with C>

0, by replacing it by the difference of two convex functions:

1

2

(

x+y

)

2− 1 2

(

x+y

)

.

Both reformulations preserve the cost of every feasible binary solution. In our example, the minimum eigenvalue of Q0 is −4.

Thus, the objective function is replaced by (i) with the con- vex function f

(

x

)

= f

(

x

)

+4

((

e1

12

)

2+

(

e121

)

2+

(

e212

)

2+

(

e221

)

2+

(

z11

)

2+

(

z12

)

2+

(

z21

)

2+

(

z22

)

2

)

− 4

(

e1

12+e121+e122 +e221+z11+ z12+ z21+ z22

)

. In reformulation (ii), the products of the z and e variables are replaced by the difference of convex functions. For instance, the product 3 z11e1

12 in the objective function of our

example above is replaced by 3

2

(

z11+ e112

)

2−32

(

z11+ e112

)

.

The resulting convex 0–1 QCQP formulation of the HCP with (i) was denoted

HCP-R2

whereas that with (ii) was denoted

HCP-R3

. They are solved by CPLEX which automatically converts a 0–1 con- vex QCQP formulation into a 0–1 second order cone program for which relaxations are solved via the barrier algorithm.

3. VNS heuristic for the HCP

VNS is a metaheuristic developed to solve combinatorial and global optimization problems by changing neighborhoods in its lo- cal descent step for intensification as well as in its shaking step for diversification (see Hansen, Mladenovi ´c, and Pérez, 2010 , for a survey).

VNS relies on the following three observations:

Observation 1: A local minimumwith respect to one neighbor-hoodstructureisnotnecessarysoforanother;

Observation 2: A global minimum is a localminimum with re-specttoallpossibleneighborhoodstructures;

Observation 3: Localminimawithrespecttooneorseveral neigh-borhoodsareoftenrelativelyclosetooneanother.

In the VNS framework, the neighborhoods are defined around types of moves, or perturbations, of the best current solution x – the center of the search. When looking for a better one in a mini- mization problem, a solution x is drawn at random in an increas- ingly wider neighborhood, and a local descent is performed from x

leading to another local optimum x . If x is worse than x, then x

is ignored and one chooses a new neighbor solution x in a more distant neighborhood of x. If instead x is better than x, the search is re-centered around x and the local search restarts in the clos- est neighborhood of the newly found best current solution. Once

all neighborhoods of x have been explored without success, one begins again with the closest one to x, until a stopping condition (e.g. maximum CPU time) is met.

As the size of neighborhoods tends to increase with their dis- tance from the current best solution x, close-by neighborhoods are explored more thoroughly than far away ones. This strategy takes advantage of the three observations 1–3 mentioned above, and yet can ensure with sufficient computational time that the algorithm is not stuck in a poor local optimum. We now turn to our imple- mentation of VNS for the HCP.

3.1. Initialization

VNS requires an initial solution which can be either provided or constructed by the user. Algorithm 1 presents the pseudocode of our approach to construct an initial solution.

Algorithm 1 VNS: Constructive Heuristic (CH).

Compute a matrix F =

(

fab

)

with dimension m× m such that

fabis the Frobenius norm of Da− Db;

Solve a p-median problem with p=G medians using matrix F as input;

for k= 1 ,...,m do

set zkg=1 if the dissimilarity matrix of individual k is as-

signed to the gth median, zkg= 0 otherwise;

end for

for g=1 ,...,G do

solve subproblems Mg

(

z

)

;

end for

Algorithm 1 first solves the problem of assigning individuals to groups, and does so by using a distance matrix of m× m individu- als based on the Frobenius norm of each individual’s distance ma- trix between objects. Once the p-median is applied to this distance matrix between individuals, they are assigned to groups according to the partition obtained, i.e., if a pair of individuals have their dis- tance matrices assigned to the same cluster in the p-median model then these individuals are assigned to the same group in the initial solution. Then, for each initial group, solving subproblems Mg( z),

for g=1 ,...,G provides a complete initial solution for HCP: Mg

(

z

)

= min n  i=1 n  j=1 dgi jegi j (23) subject to n  j=1 egi j= 1

j

{

1 ,...,n

}

(24) egi j≤ e g j j

i,j

{

1 ,...,n

}

(25) n  j=1 egj j=



g

(26) egi j

{

0 ,1

}

i,j

{

1 ,...,n

}

, (27) where dgi j =mk=1di jkzkg, and



g= m k=1ckzkg m

k=1zkg . We note that prob- lem ( 23 )–( 27 ) corresponds to the p-median problem ( 1 –5 ). Fur- thermore, this constructive heuristic could be easily replaced by others, such that other distance norms (e.g., L1, L∞) could be sub-

stituted just as the assignment of objects to clusters could be done via any other partitioning heuristic. For our constructive heuristic, we used the approach by Hansen and Mladenovi ´c (1997) as it en- sures that each group contains at least one individual.

(5)

3.2. Shaking

The shaking component of our VNS is implemented by means of random moves in the swap neighborhood which encompasses all the possible ways of removing an individual from a group and adding it to a different one. Thus, if the parameter t=2 for shak- ing, then two random swap moves are performed for two individ- uals; if t = 3 , then three swap moves are performed for three in- dividuals, and so on.

3.3. Localsearch

Given an existing solution, we need to search the neighbor- hood to reach a local optima. We developed our local search following the Variable Neighborhood Descent (VND) framework, which generalizes the observations 1–3 to descent methods. Algorithm 2 presents a general VND’s algorithmic steps.

Algorithm 2 Local Search: VND Framework.

Input: a solution x and a set of descent methods descent s, for

s

{

1 ,...,smax

}

ssmin

repeat

x descent s

(

x

)

;

If x  = x make xx and ssmin; otherwise ss+ 1 ;

until s>smax

Applied to HCP, VND involves the iterative optimization of the objective function via improvements based on three descent meth- ods: (1) descent on the clustering of objects (conditional on group memberships and number of medians), (2) descent on the group memberships (conditional on objects clusterings and number of medians), and (3) descent by (perhaps) augmenting the number of medians. VND (the local search) ends when all descent meth- ods have been consecutively explored without any improvement in the objective function. Whenever an improvement occurs, the algorithm resets s to smin. In the present section, we present each of our descent procedures.

3.3.1. Firstdescent

The descent method descent 1 for Algorithm 2 solves subprob-

lem (23)–(27) for each group affected by the shaking procedure. Namely, for each group the descent method identifies the condi- tionally optimal clustering of objects assuming that both the num- ber of medians and the group memberships are known. Our choice has been to perform this descent by heuristics (e.g., Hansen & Mladenovi ´c, 1997; Resende & Werneck, 2004; Hansen, Brimberg, Uroševi ´c, & Mladenovi ´c, 2009 ) to accelerate the whole algorithm.

3.3.2. Seconddescent

The second descent, descent 2, temporarily assumes the cluster-

ing of objects to be known in all groups (i.e. variables e). Then, it descents by conditionally reassigning individuals to the groups that provide the best values for z. To do so, the following binary program is solved: W

(

e

)

= min m  k=1 G  g=1 zkgd˜ kg (28) subject to  m k=1ckzkg  m k=1zkg

ω

g

g

{

1 ,...,G

}

(29) G  g=1 zkg= 1

k

{

1 ,...,m

}

(30) zkg

{

0 ,1

}

g

{

1 ,...,G

}

,

k

{

1 ,...,n

}

(31) where d˜ kg=n i=1nj=1dki je g i j, and

ω

g= n j=1egj j. Problem ( 28 )–

( 31 ) is a binary program which is usually solved at the root node of the branch-and-cut algorithm implemented by CPLEX according to our limited computational experiments. We chose to halt the second descent after root node solution. If the W( e) is not solved to optimality, the previous solution is kept.

3.3.3. Thirddescent

It is trivial that the objective function of a clustering algorithm is improved when one allows the clustering solution to have a greater number of medians. However in our case, the HCP re- strains the number of medians in each group g, for g

{

1 ,...,G

}

,

by means of constraints (14) , and thus pushes that number to the largest integer smaller or equal to

m k=1ckzkg

m k=1zkg

. Specifically for the HCP, increasing the number of medians for a group-level clustering solution will not necessarily always improve the objective function. The third descent thus aims at seeing if the number of medians in a group g∗ can be augmented by reallocating objects to a new median, thereby satisfying constraints (14) to the new number of medians.

Algorithm 3 details how this procedure works. Specifically, for each group g

(

g∗∈

{

1 ,...,G

}

)

we initialize a solution ( zbest, ebest)

with the values of the best current solution for the HCP, i.e., ( zbest,

ebest) ( z, e). Then, the problem M

g

(

z

)

is solved with



g∗ re-

placed by



g+ 1 , thereby producing a new partition for the ob-

jects in g∗ for which the number of medians is increased by one unit. We then solve the problem W( e) and verify if this replace- ment of



gby



g+ 1 , can be accommodated by reassigning the

individuals among the groups. If W( e) is infeasible or if the cost of the new yielded solution is larger than the best solution found, the solution of W( e) is ignored. Otherwise, the solution of W( e) be- comes the new best incumbent solution.

Algorithm 3 descent 3.

Input: a solution

(

z,e

)

for HCP

(

zbest,ebest

)

(

z,e

)

for g∗= 1 ,...,G do

Solve Mg

(

z

)

with



g∗ replaced by



g∗+1 ;

Solve W

(

e

)

;

if W

(

e

)

is infeasible or cost of

(

z,e

)

greater than the cost of

(

zbest,ebest

)

then

(

z,e

)

(

zbest,ebest

)

;

else

(

zbest,ebest

)

(

z,e

)

;

end if end for

return

(

zbest,ebest

)

A critical element in VND heuristics is the order in which the descent methods should be explored (e.g., in Algorithm 2 ). In our implementation, this decision was based on several obser- vations. First, descent _ 3 is more computationally expensive than the other two and as such, it was set to be the last. Second, be- cause our shaking step involves perturbing only the group mem- bership variables, applying the shaking step directly after descent _ 2 (about group memberships) would revert the shaking step 2. Thus,

2 To illustrate, suppose a local minimum ( z , e ) with respect to the first and sec-

ond descent. The shaking step applied to ( z , e ) generates a new solution ( z , e ),

perturbing only the group membership variables. Thus, if the second descent is ap- plied just after the shaking step it will change back, yielding the same solution ( z , e ).

(6)

Table 1

Monte-Carlo Simulation: experimental design and dataset characteristics.

Instance Individuals Groups Objects Medians Perturbation Perturbation

m G n dissimilarities medians 1 150 10 30 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 2 300 2 18 All 6 N(0, 0.1) 0 3 450 2 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 4 150 2 18 All 3 N(0, 0.05) N(0, 0.5) 5 450 10 18 All 6 N(0, 0.05) N(0, 1) 6 150 10 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 7 300 2 18 All 6 0 N(0, 0.5) 8 150 10 18 50 percent 3, 50 percent 6 0 N(0, 1) 9 300 10 30 All 3 N(0, 0.05) N(0, 0.5) 10 450 6 18 All 3 N(0, 0.1) N(0, 1) 11 150 6 30 All 6 N(0, 0.1) 0 12 300 10 18 All 3 0 0 13 450 10 18 All 6 N(0, 0.1) 0 14 300 6 18 50 percent 3, 50 percent 6 0 N(0, 1) 15 300 2 30 All 6 N(0, 0.05) N(0, 1) 16 450 2 30 50 percent 3, 50 percent 6 0 N(0, 1) 17 300 6 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 18 300 6 30 50 percent 3, 50 percent 6 N(0, 0.05) 0 19 150 6 18 All 6 0 N(0, 0.5) 20 450 6 30 All 3 0 0 21 150 2 30 All 3 N(0, 0.1) N(0, 1) 22 450 2 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 23 450 6 18 All 3 N(0, 0.05) N(0, 0.5) 24 300 10 18 All 3 N(0, 0.1) N(0, 1) 25 150 6 18 All 6 N(0, 0.05) N(0, 1) 26 150 2 18 All 3 0 0 27 450 10 30 All 6 0 N(0, 0.5)

we used descent _ 1 as the first descent, followed by descent _ 2 and descent _ 3 .

4. Monte-Carlo simulation: performance and robustness In the previous section, we have introduced a new model, a series of reformulations, and a heuristic to solve for the HCP. In the present section, we tackle the relative performance of the ap- proaches. Specifically, we first generate a set of datasets for which the true solution is known. Second, we attempt to solve the HCP for each dataset via exact methods and our proposed heuristic, showing that the use of a heuristic is necessary because exact methods cannot be used for problems of moderate size. Third, we compare the performance of the proposed heuristic to that of a benchmark heuristic based on a related model and show that the performance of our heuristic is significantly better. Fourth, we in- vestigate the circumstances under which the proposed heuristic for HCP is likely to outperform competing alternatives.

4.1.Datasetsgeneration

To do so without providing any algorithm an unfair advantage, we needed a set of problems with known data generating mech- anisms for Dk, and ck. As such, we simulated data following the

fractional factorial experimental design used by Blanchard et al. (2012a) . The process involves generating 27 simulated datasets (i.e., experimental trials) that have known solutions and that can be used to study the impact of different dataset characteristics on the ability of the competing algorithms to perform. The factorial design appears in Table 1 .

The characteristics of the datasets (experimental factors) in- cluded the total number of individuals ( m = 150, 300, 450), the number of groups ( G = 2, 6, 10), the number of objects ( n = 18, 30), the variance in the number of medians across groups, the amount of error added to the dissimilarity matrix of each individ- ual (using N(0, 0.05) or N(0, 0.1) before rounding), and the amount of error added to the number of medians sought by each individual

(using N(0, 0.5) or N(0, 1) before rounding). Further, following the works of Blanchard et al. (2012a) , Blanchard and DeSarbo (2013) , Brusco and Cradit (2001) and others, we also assume that the true number of groups G is known - but not their composition. 3

4.2. Comparisonstoexactsolvers:optimizationresults

In the present section, we wish to establish the necessity of in- troducing a heuristic for HCP via comparing its results to the fol- lowing exact solvers:

(a) Couenne version 0.5.3, Baron version 15.2.0 and GloMIQO version 2 on formulation (6) - (14) ,

(b) CPLEX version 12.6 on

HCP-R1

,

HCP-R2

and

HCP-R3

, and (c) the VNS heuristic presented in the last section.

To compare the performance of the generic MINLP solvers, CPLEX, and the VNS heuristic, we use each to solve the 27 sim- ulated instances presented in Table 1 . Computational experiments were performed on a Xeon(R) CPU X5650 2.67 gigahertz and 64 gigabytes of RAM memory. Couenne, Baron, GloMIQO and CPLEX were allowed to run for 24 hours with default parameters. The VNS heuristic was allowed to run for 600 seconds. The algorithm was implemented in C++ and compiled by gcc 4.4.

4.2.1. Performanceofexactalgorithms

As most of the instances could not be solved to optimality, Table 2 reports the best upper bounds ( ub), lower bounds ( lb) and number of explored branch-and-bound nodes ( # bbn) as obtained by each solver. The upper bounds correspond to feasible solutions obtained by different heuristics used within each solver. They are important in branch-and-bound algorithms to eliminate branches of the enumeration tree that do not lead to the optimal solution. Better upper bound values are usually able to cut more of these branches, which improves the overall performance of branch-and- bound methods. An empty value in column ( ub) indicates that no

(7)

Table 2

Bound values obtained by the solvers for the 27 instances generated for Monte-Carlo Simulation.

Instance Couenne Baron GloMIQO CPLEX

ub lb # bbn ub lb # bbn ub lb # bbn HCP-R1 HCP-R2 HCP-R3 ub lb # bbn ub lb # bbn ub lb # bbn 1 0 0 0 0 0 0 2707 .04 1 0 409 3979 .78 0 391 2 0 0 2388 .25 8 .85 1 0 0 2392 .48 2311 .75 613 3047 .60 0 3591 3146 .26 0 5310 3 0 0 4604 .21 0 1 0 0 5064 .68 4364 .75 252 5633 .01 0 1534 5763 .45 0 2766 4 0 0 2387 .97 0 4 0 0 1870 .91 1819 .56 2041 1992 .85 0 13948 1972 .79 0 13032 5 0 0 0 0 0 0 3346 .33 1 5583 .94 0 293 7063 .65 0 109 6 0 0 0 0 0 0 2437 .02 1382 .45 23 2028 .00 0 3200 2166 .37 0 1182 7 0 0 3698 .33 0 1 0 0 2652 .68 2410 .68 419 3410 .67 0 4269 3315 .00 0 3817 8 0 0 0 0 0 0 1470 .84 1 2081 .83 0 3982 2123 .50 0 1313 9 0 0 0 0 0 0 6646 .22 1 0 163 8008 .79 0 103 10 0 0 0 0 0 0 4773 .51 1 0 611 7020 .89 0 470 11 0 0 0 0 0 0 2516 .03 1 0 800 3823 .25 0 666 12 0 0 0 0 0 0 3749 .99 1 0 840 4327 .33 0 332 13 0 0 0 0 0 0 3067 .90 1 0 269 6314 .56 0 161 14 0 0 0 0 0 0 2905 .84 1 4 4 46 .83 0 553 4475 .67 0 720 15 8308 .26 0 2 0 0 0 0 6126 .58 5642 .20 1 6860 .69 0 1514 6811 .36 0 1609 16 12667 .50 0 1 0 0 0 0 9668 .71 1 11937 .90 0 282 0 458 17 0 0 0 0 0 0 4883 .34 2669 .76 6 3869 .98 0 1653 4386 .82 0 857 18 0 0 0 0 0 0 5993 .60 1 0 217 7906 .57 0 295 19 0 0 1714 .33 0 1 0 0 1190 .01 1 1724 .33 0 3695 1715 .33 0 3480 20 0 0 0 0 0 0 10935 .00 1 0 95 12279 .10 0 84 21 3593 .26 0 86 244985 .95 0 1 0 0 4218 .56 3405 .42 5 3795 .42 0 3669 3797 .77 0 5907 22 7065 .28 0 4 254434 .50 0 1 0 0 4729 .62 4231 .32 300 5626 .56 0 1808 5965 .84 0 1726 23 0 0 0 0 0 0 5206 .48 1 0 643 7145 .39 0 262 24 0 0 0 0 0 0 3167 .10 1 0 640 4454 .83 0 203 25 0 0 259742 .09 0 1 1136 .85 1 1354 .82 1142 .62 1338 1777 .15 0 3603 1770 .16 0 2011 26 2023 .33 0 833 2037 .32 24 .99 1 1874 .99 1874 .99 1 1874 .99 1874 .99 1 2002 .49 0 20016 1895 .16 0 110944 27 0 0 0 0 0 0 8657 .60 1 0 100 0 35

feasible solution was reported by the solver before the time limit was attained for that instance.

The results in Table 2 reveal that instance #26 is the only one that could be solved to optimality by our exact approaches. It has the easiest problem characteristics of all our datasets, with

m= 150 ,n= 18 ,G= 2 and without any kind of perturbation (er- ror) added. Yet, solver GloMIQO spent 12510 seconds to solve the problem whereas CPLEX on

HCP-R1

took 374 seconds. For in- stance #27 , none of the solvers were able to find a feasible upper bound solution within 24 hours of CPU time.

Across all the datasets, the lower bounds obtained by the solvers are very often equal to the trivial one (zero). The only ex- ception is those obtained by CPLEX for the RLT-linearized formula- tion (

HCP-R1

), which is still quite difficult as revealed by the num- ber of branch-and-bound nodes solved by CPLEX within 24 hours. In 18 of the 27 instances ( ≈ 67 percent ), CPLEX was able to solve only the root node.

It does indeed seem that the reformulations presented in Sections 2.2 and 2.3 helped CPLEX to tackle the problem as it per- formed better than Couenne, Baron and GloMIQO applied to the original HCP formulation. The lower bounds obtained by the latter are never better than those obtained by CPLEX on

HCP-R1

. Re- garding upper bounds, Baron and Couenne together found only 4 times the best solutions (in instances #2 , #3, #19 and #21).

Among the convexified formulations,

HCP-R2

and

HCP-R3

do not allow CPLEX to obtain better lower bounds than those ob- tained by CPLEX on

HCP-R1

despite exploring more branch-and- bound nodes. However, we note that the upper bounds obtained on these two formulations (

HCP-R2

and

HCP-R3

) were better than those obtained for

HCP-R1

in 18 out of 27 instances, i.e., ≈ 67percent .

4.2.2. PerformanceoftheVNSheuristic

Table 3 presents the results obtained by Algorithm 1 (CH) and the VNS heuristic. The second column reports the cost of the initial

Table 3

Summary of VNS results for the 27 instances generated for Monte Carlo Simulation.

Instance CH VNS

Average std Improv. best GAP best ub dev ub (percent) lb (percent)

1 3257 .668 3211 .298 6 .34 19 .310 15 .392 2 2388 .253 2388 .253 0 .00 0 .0 0 0 3 .140 3 4604 .207 4604 .207 0 .00 0 .0 0 0 5 .199 4 1996 .796 1869 .924 0 .00 0 .053 2 .660 5 3936 .212 3792 .574 42 .55 32 .081 11 .356 6 1525 .197 1525 .197 0 .00 24 .793 9 .346 7 2900 .010 2651 .011 0 .00 0 .013 9 .066 8 1785 .667 1598 .416 27 .76 23 .221 7 .982 9 7604 .898 7391 .292 17 .86 7 .710 10 .034 10 6049 .086 5701 .861 29 .07 18 .787 15 .158 11 2835 .408 2835 .408 0 .00 25 .838 9 .588 12 3749 .985 3749 .985 0 .00 13 .342 0 .0 0 0 13 3562 .485 3562 .485 0 .00 43 .583 13 .570 14 3249 .999 3075 .149 4 .99 30 .846 5 .506 15 60 0 0 .374 5760 .642 0 .00 5 .973 2 .056 16 10192 .500 9869 .400 0 .00 16 .608 2 .033 17 3441 .239 3136 .460 22 .44 18 .954 14 .880 18 6497 .311 6497 .311 0 .00 17 .824 7 .618 19 1241 .673 1206 .006 0 .00 29 .652 1 .327 20 10935 .0 0 0 10935 .0 0 0 0 .00 10 .946 0 .0 0 0 21 3709 .921 3593 .040 0 .00 0 .006 5 .222 22 4929 .876 4610 .666 0 .00 2 .515 8 .228 23 5987 .091 5685 .854 32 .20 20 .426 7 .505 24 3945 .636 3764 .599 12 .40 15 .494 15 .724 25 1389 .303 1253 .215 13 .17 7 .499 8 .825 26 1874 .993 1874 .993 0 .00 0 .0 0 0 0 .0 0 0 27 9207 .0 0 0 9004 .640 39 .75 29 .041 3 .854

solution provided by the constructive heuristic. The third column presents the average upper bound solutions obtained in 10 dis- tinct executions of the heuristic, whereas the fourth column shows their associated standard deviation. The fifth column refers to the relative difference between the VNS solutions and the best upper

(8)

Table 4

Average improvements yielded by the incremental application of the descents within the VND framework.

Instance ivns1 ivns1+2 ivns1+2+3 1 0 .841 0 .497 0 .090 4 6 .344 0 .011 0 .0 0 0 5 1 .272 2 .373 0 .036 7 8 .583 0 .004 0 .0 0 0 8 10 .098 0 .339 0 .093 9 1 .083 0 .791 0 .961 10 3 .337 1 .401 1 .100 14 3 .122 0 .584 1 .757 15 3 .611 0 .399 0 .0 0 0 16 0 .696 0 .012 2 .480 17 5 .389 0 .699 2 .987 19 1 .460 0 .0 0 0 0 .0 0 0 21 2 .912 0 .246 0 .0 0 0 22 2 .758 0 .466 3 .372 23 2 .340 2 .189 0 .579 24 2 .375 2 .469 −0 .207 25 5 .500 2 .882 1 .713 27 0 .856 1 .363 −0 .009

bound values presented in Table 2 . Finally, the sixth column refers to the relative difference between the solutions obtained by VNS and the best lower bound values of Table 2 .

Comparing to the results obtained for the exact solvers, we find that VNS always obtained better upper bound solutions. The only exceptions are for instances #2 ,#3 and #26 , when all algo- rithms obtain the same objective function value. The superior per- formance of VNS attains its maximum for instance #13 with a dif- ference of 43.58 percent in solution quality.

Our results suggest that the proposed VNS algorithm is sta- ble, as demonstrated by the very small standard deviations, which were inflated by a few larger instances. For instance, the largest variability in objective function values for the 10 distinct VNS ex- ecutions is found for instance #5 , whose data contains perturba- tions not only in dissimilarity values and in the number of me- dians sought by each individual, but also had a large number of groups to solve for (i.e., G=10 ).

Contrasting constructive heuristic to the entire VNS algorithm, we note that the constructive heuristic provided a solution which could not be improved by VNS in 9 out of 27 instances (i.e. ≈ 33 percent ). It is interesting to note that the initial solution pro- vided by the constructive heuristic, in 20 out of 27 instances, led to better upper bound solutions than those obtained by CPLEX in 24 hours.

Our VNS algorithm is composed of three descent steps in its local search step. To contrast their effectiveness, we conducted a series of experiments in which they are used in an incremental way. Its results are summarized in Table 4 . The first column of the table refers to the instances for which VNS improves the so- lutions provided by the constructive heuristic. The other columns refer to the relative differences in cost obtained by the application of the successive descents. Thus, ivns1reports the average improve-

ments (in percent) obtained by VNS with respect to the solutions provided by the constructive heuristic using only the first descent as local search. Let costCH be the cost of the solution obtained by

the constructive heuristic, then ivns1 is calculated as costCHcost−costCHvns1,

where costvns1 is the average cost obtained by VNS using only the

first descent. Similarly, ivns1+2 reports the average improvements

(in percent) obtained by VNS when the second descent is added to the VND framework, and finally, ivns1+2+3 when the third descent

is included to complete our algorithm. All average results in the table are calculated for 10 runs of the VNS heuristic with a time limit of 600 seconds.

We notice from Table 4 that the gains incurred by the incre- mental use of the proposed descents are non-increasing in av- erage: ≈ 2.83 percent with only the first descent, ≈ 0.58 percent with the first and second descents, and ≈ 0.09 percent with all three. However, we note that in some cases (instances #16 and #22) the largest gains are obtained only after the third descent is used. The negative values for Instances #24 and #27 are due to the smaller number of VNS iterations within the established time limit when the third descent is applied. Nevertheless, its incorporation within VND improved the average solution values in 11 out of 18 instances.

4.3. Performancecomparisonwithbenchmarkmodels

In the experimental datasets used in our comparisons, we not only know the objective function value at the global optimum but also the true assignments of individuals to groups and, for each group, the true clustering solutions. As such, we can also inves- tigate the ability of the algorithms to recover these decision vari- ables. In this subsection, we compare the classification provided by the heuristics for the HCP with that provided by using the heuris- tic for the heterogeneousp-medianproblem (HPM) ( Blanchard et al., 2012a ).

The two clustering models, and the resulting VNS heuristics, share many similarities. For starters, both models aim to group in- dividuals based on the clustering solutions that can be obtained from their perceptions of a set of objects. However, the models are distinct in that: (i) whereas the number of medians in a group is conditioned to individuals membership in HCP, it is a variable in HPM; (ii) HPM is a multi-objective model converted to a single- objective model; it weights the sum of dissimilarities between each object and its assigned cluster, conditional on group mem- bership, and the difference between the number of medians of each individual and the estimated number of medians. This sec- ond observation is critical because HPM requires the user of the proposed algorithm to select the value for the weight parameter. Our comparison contrasts the results obtained by the VNS heuristic of Section 3 and the VNS heuristic of Blanchard et al. (2012a) for the HPM using the same aforementioned computational platform. The weight used in HPM was set to the average of all dissimilarity values, weighting both roughly equally as was done in Blanchard et al. (2012a) .

Given that not only the algorithms but the models also are dis- tinct, to compare the resulting approaches we need a performance metric that does not systematically favor one over the other. As such to examine the models’ (and the algorithms) ability to re- cover the original data of Table 1 , we use the Adjusted Rand In- dex (ARI; Hubert and Arabie, 1985) . Doing so allows us to com- pare the true clustering used in generating the simulated data with the predicted clustering variables e, as well as the individuals true partition in comparison with the predicted assignment variables z. We believe that this comparison is fair given that the same VNS heuristic framework was used, that both heuristics were demon- strated to have good performance regarding the optimization of their respective models, and that the results were collected using the same computational platform.

The results in the last four columns of Table 5 indicate the ARI index with respect to the recovery of the objects clustering (col- umn e) and the groups (column z). To calculate these measures, we obtained ARI separately for each individual before averaging the re- sults across all the individuals. For both heuristics, we allowed 600 seconds of computational time.

Our results suggest that, on average, both algorithms performed very well with an average ARI of .952 for HCP and .801 for HPM. However, the results from a paired sample t-tests suggests that the within-dataset difference between the algorithms for HCP

(9)

Table 5

Monte Carlo Simulation: recovery results.

Instance Individuals Groups Objects Medians Perturbation HCP HPM

Dissimilarities Medians ARI ARI

z e z e 1 150 10 30 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 0 .886 0 .903 0 .958 0 .713 2 300 2 18 All 6 N(0, 0.1) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 3 450 2 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 4 150 2 18 All 3 N(0, 0.05) N(0, 0.5) 0 .947 0 .988 0 .947 0 .616 5 450 10 18 All 6 N(0, 0.05) N(0, 1) 0 .928 0 .911 0 .985 0 .892 6 150 10 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 7 300 2 18 All 6 0 N(0, 0.5) 0 .987 0 .881 0 .987 1 .0 0 0 8 150 10 18 50 percent 3, 50 percent 6 0 N(0, 1) 0 .166 0 .979 0 .167 0 .182 9 300 10 30 All 3 N(0, 0.05) N(0, 0.5) 0 .700 0 .787 0 .852 0 .727 10 450 6 18 All 3 N(0, 0.1) N(0, 1) 0 .771 0 .844 0 .783 0 .267 11 150 6 30 All 6 N(0, 0.1) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 12 300 10 18 All 3 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 13 450 10 18 All 6 N(0, 0.1) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 14 300 6 18 50 percent 3, 50 percent 6 0 N(0, 1) 0 .960 0 .985 0 .878 0 .657 15 300 2 30 All 6 N(0, 0.05) N(0, 1) 0 .934 0 .985 0 .871 0 .716 16 450 2 30 50 percent 3, 50 percent 6 0 N(0, 1) 0 .879 0 .969 0 .293 0 .846 17 300 6 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 0 .900 0 .958 0 .711 0 .818 18 300 6 30 50 percent 3, 50 percent 6 N(0, 0.05) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 19 150 6 18 All 6 0 N(0, 0.5) 0 .968 0 .987 0 .947 1 .0 0 0 20 450 6 30 All 3 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 21 150 2 30 All 3 N(0, 0.1) N(0, 1) 0 .821 0 .955 0 .797 0 .200 22 450 2 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 0 .956 0 .990 0 .247 1 .0 0 0 23 450 6 18 All 3 N(0, 0.05) N(0, 0.5) 0 .783 0 .882 0 .860 0 .668 24 300 10 18 All 3 N(0, 0.1) N(0, 1) 0 .751 0 .862 0 .879 0 .417 25 150 6 18 All 6 N(0, 0.05) N(0, 1) 0 .890 0 .913 0 .868 0 .897 26 150 2 18 All 3 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 27 450 10 30 All 6 0 N(0, 0.5) 0 .889 0 .926 0 .946 1 .0 0 0 ( M=.952 ,SD=.059 ) and HPM ( M=.801 ,SD=.263 ) is significant ( t

(

26

)

= 3 .20 ,p<.01 ) 4. In fact, excluding the 9/27 datasets where both algorithms perfectly recovered the original data, the one for HCP outperformed the one for HPM in 14 out of 18 trials. With re- spect to the recovery of the assignments of individuals to groups, both algorithms also performed very well. Namely whereas the heuristic for HCP obtains an average ARI of .893 ( M=.893 , SD=

.169 ), the heuristic for HPM obtains .851 ( M=.851 , SD=.236 ). The within-dataset difference between the two is not significant ( t

(

26

)

= 1 .18 ,p=.25 ).

4.4. Sensitivitytodatasetcharacteristics

What affects the algorithms’ ability to recover the original data? To investigate this critical question, we used multiple linear re- gression to predict each algorithm’s ARI using dummy-coded fac- tors for each of the data characteristics. The results are displayed in Table 6 for the objects clustering variables and in Table 7 for the individuals grouping variables. In both tables, the rows indicate the data structures characteristics that were manipulated in the 27 generated instances. Each row contains the main effects (beta co- efficients) for the factors used as independent variables, along with the significance of the factor. Finding a significant regression coef- ficient suggests that the algorithm is sensitive to the data charac- teristic. As few significant coefficients, as possible, is desired for an algorithm to be robust.

With respect to HPM, we find that the algorithm has some sen- sitivity to changes in data structures. Specifically whereas the algo- rithm is unaffected by the number of individuals or the number of clusters, partitions with numerous clusters of equal sizes are bet- ter recovered than fewer clusters or those with uneven sizes. We also find that error (even in small amounts) to both dissimilari- ties and to number of medians significantly affects performance.

4 t refers to the t-statistic, and p refers to the associated p-value.

The algorithm for HCP, in contrast, is mostly unaffected by data structures. Of note, it is particularly insensitive to errors added to the distances. It is also better able to recover clustering structures with a larger number of objects, and datasets when the number of groups is smaller than 10. That said, the impact of these factors is minimal as the mean ARI is .95 – a near perfect recovery of the original pairwise data.

With respect to assignments of individuals to groups, we find that error added to the number of medians is a significant pre- dictor, and that it is also sensitive to datasets where the num- ber of groups is a few (but large) clusters. The algorithm for HCP is mostly unaffected when it comes to recovering group mem- berships. It has marginally more difficulty recovering large group memberships (when G=10 ) and is only impacted by large error added to the number of medians.

5. Empirical illustration: understanding differences in perceptions of assortments of chocolate candy

To further demonstrate the usefulness and performance of the proposed procedure, we collected data for a real-world application and used the proposed VNS heuristic to illustrate heterogeneity in the clustering performed by different individuals. The sorting task (also known as card sorting) asks participants to allocate a set of objects into piles according to their own perception. It is com- mon to instruct participants to (1) put objects into the same pile if they are similar in some way (there are no pre-determined labels), and (2) use as many piles as they desire (c.f., Blanchard & Banerji, 2016 ), and the result is a set of participants who performed their own “partitions” over the set of objects.

The pairwise similarity data provided by the sorting task is, in the most simplified way, yk

i j=1 if individual k ( k ∈ {1, … , m})

places objects i and j ( i, j ∈ {1, … , n}) in the same pile (high similarity), 0 otherwise (low similarity). The task is particularly suited for the generation of such pairwise similarity data because

(10)

Table 6

Monte Carlo Simulation: factors influencing the clustering of objects by the individuals.

Factor HCP HPM

Coefficient t-value p-value Coefficient t-value p-value

Intercept 1 .084 37 .521 .0 0 0 ∗∗∗ 1 .081 37 .826 .0 0 0 ∗∗∗ individuals 300 −.030 −1 .426 .174 −.023 −1 .134 .275 (default: 150) 450 −.023 −1 .085 .295 .0 0 0 .005 .996 groups 6 −.022 −1 .067 .303 −.028 −1 .378 .188 (default: 2) 10 −.045 −2 .136 .050 ∗∗ −.008 −.387 .704 Number of Objects 30 −.007 −.404 .692 .001 .051 .960 (default: 18)

Median Spread All 3 −.052 −2 .479 .026 ∗∗ −.091 −4 .383 .001 ∗∗∗

(default: 50% 3, 50% 6) All 6 −.020 −.964 .350 .041 1 .971 .067 ∗

Added Error on Dissimilarities Small −.029 −1 .397 .183 −.054 −2 .598 .020 ∗∗

(default: none) Large −.024 −1 .143 .271 −.105 −5 .086 .0 0 0 ∗∗∗

Added Error on Number of Medians Small −.078 −3 .718 .002 ∗∗∗ −.078 −3 .800 .002 ∗∗∗

(default: none) Large −.067 −3 .190 .006 ∗∗∗ −.159 −7 .709 .0 0 0 ∗∗∗

R2 0.678 0 .893

Adjusted R 2 0.448 0 .882

Mean ARI 0.952 0 .913

Std ARI 0.059 0 .104

indicates p < .10, ∗∗indicates p < .05, ∗∗∗indicates p < .01.

Table 7

Monte Carlo Simulation: factors influencing group membership.

Factor HCP HPM

Coefficient t-value p-value Coefficient t-value p-value

Intercept .956 9 .216 .0 0 0 ∗∗∗ .985 9 .278 .0 0 0 ∗∗∗ Individuals 300 .062 .822 .424 .079 1 .031 .319 (default: 150) 450 .059 .783 .446 .111 1 .450 .168 groups 6 - .028 - .374 .714 .009 0 .122 .905 (default: 2) 10 - .134 -1 .784 .095 ∗ - .031 -0 .405 .691 Number of Objects 30 .012 .181 .858 .018 0 .278 .785 (default: 18)

Median Spread All 3 .003 .038 .970 −.142 −1 .855 .083 ∗

(default: 50 percent 3, 50 percent 6) All 6 .094 1 .259 .227 .116 1 .507 .152

Added Error on Dissimilarities Small .037 .492 .630 −.018 −0 .239 .814

(default: none) Large .026 .348 .733 −.095 −1 .245 .232

Added Error on Number of Medians Small −.109 −1 .460 .165 −.134 −1 .747 .101 ∗∗

(default: none) Large −.211 −2 .816 .013 ∗∗ −.428 −5 .589 .0 0 0 ∗∗∗

R2 0.494 0 .763

Adjusted R 2 0.124 0 .590

Mean ARI 0.893 0 .813

Stdev ARI 0.170 0 .254

indicates p < .10, ∗∗indicates p < .05, ∗∗∗indicates p < .01.

the task mirrors closely the cognitive activities involved in the categorization process individuals follow as they form similarity judgments ( Coxon, 1999 ), and it leads to as high quality data with less fatigue and boredom from participants as compared to pair- wise similarity tasks ( Bijmolt & Wedel, 1995; Rao & Katz, 1971 ).

Data was collected from an online sorting study featuring

m= 189 undergraduate students from a large northeastern United States university who answered about their perceptions about n= 20 chocolate candies disposed at the Corp’s Vital Vittles, the first storefront for Students of Georgetown Inc. opened in 1973. To- day, Vital Vittles is a full-service grocery store which sells frozen foods, meals on the go, and a variety of home supplies. Because the Georgetown Campus housing is fairly isolated, it is considered a one-stop shop: Vittles faces little competition from local grocery stores. It is also very healthy financially, with gross sales averaging over 2 million dollars a year and a 23 percent gross profit margin. As part of its most prominent checkout counter shelves, Vital Vittles offers a large selection of chocolate candy. The shelf sec- tion dedicated to chocolate snacks includes the following options: Almond Joy, Baby Ruth, Butterfinger, Hershey (Almond), Hershey (Plain), Junior Mints, Kit Kat, M & M (Peanut), M & M (Plain), Mars Bar, Milky Way, Mounds Bar, Nestle’s Crunch, Oh Henry!,

Payday, Reece’s Cups, Snickers, Three musketeers, Twix, and York Mint.

Do consumers perceive these brands in similar ways? The piles made by these participants provide preliminary evidence that we can expect heterogeneity in the clustering solutions obtained: the mean number of piles (partitions) made by the participants was 5.73 ( min= 2 ,max= 12 ), and the large variation in the number of piles made is illustrated by the histogram in Fig. 1 .

In order to be used by the HCP algorithm, the sort provided by each individual k, for k=1 ,...,m, is converted to a dissimilarity matrix Dkfollowing the procedure proposed by Takane (1980) , and

the number of medians ck is made equal to the number of piles

made by that individual in the sorting task.

5.1. Modelselection&performance

For both VNS heuristics, we performed 10 executions of the procedures. All executions were terminated after 600 seconds of CPU time and Table 8 shows the best of the objective functions as the number of groups ( G) increases. To facilitate model selection, the table also shows the percentage improvement obtained when an additional group is added. Both algorithms seem to identify a

Referências

Documentos relacionados

we intend is to answer: i) If biometrics Indexes vary between incubated and non-incubated individuals? ii) Is there a relation between frequency of the individuals incubated

Tabela 26 – Razão entre perdas de massa de rocha e de abrasivo e valores médios de brilho do “granito” Preto São Gabriel durante a etapa abrasiva 400 mesh em todas

Ordination of the species (number of individuals) in relation to season (day/night) resulting from principal components analysis (PCA) applied to abundance (n) data matrix in the

In discussing the humanization of hospital care, specifically in Intensive Care Units (ICU), weaknesses, challenges, strengths and priorities emerge. As the ICU is a unit

After the procedure, the volunteers were assigned to 4 groups, according to the TMD classification obtained from the questionnaire and clinical exam, to further analysis of the

Como se pode ver no Gráfico 6, para o teste de varredura de tensão para a pasta de dentes, existe uma região onde o material em questão se comporta como um sólido e, a partir de

Diante desse contexto, o presente trabalho apresenta a proposta de um modelo de reaproveitamento do óleo de cozinha, para a geração de biodiesel que

Não obstante a expressão “ativismo judicial” careça de determinação precisa 11 e, portanto, frequentemente seja utilizada indiscriminadamente, tanto para denominar a atividade