A model for clustering data from heterogeneous dissimilarities

(1)

Contents lists available at ScienceDirect

European

Journal

of

Operational

Research

journal homepage: www.elsevier.com/locate/ejor

Stochastics

and

Statistics

A

model

for

clustering

data

from

heterogeneous

dissimilarities

Éverton

Santi

a

, Daniel

Aloise

b , ∗

, Simon J.

Blanchard

c

a School of Sciences and Technology, Universidade Federal do Rio Grande do Norte, Natal-RN 59072-970, Brazil

b Department of Computer Engineering and Automation, Universidade Federal do Rio Grande do Norte, Natal-RN 59072-970, Brazil c McDonough School of Business, Georgetown University, Washington, DC 20057, USA

a

r

t

i

c

l

e

i

n

f

o

Article history:

Received 16 February 2015 Accepted 18 March 2016 Available online 26 March 2016 Keywords: Data mining Clustering Heterogeneity Optimization Heuristics

a

b

s

t

r

a

c

t

Clustering algorithmspartitionaset ofnobjects intopgroups(called clusters),suchthat objects as-signedtothesamegroupsarehomogeneousaccordingtosomecriteria.Toderivetheseclusters,thedata inputrequiredisoftenasinglen× ndissimilaritymatrix.Yetformanyapplications,morethanone in-stanceofthedissimilaritymatrixisavailableandsotoconformtomodelrequirements,itiscommon practicetoaggregate(e.g.,sumup,average)thematrices.Thisaggregationpracticeresultsinclustering solutionsthatmaskthetruenatureoftheoriginaldata.Inthispaperweintroduceaclusteringmodel which,tohandletheheterogeneity,usesallavailabledissimilaritymatricesandidentifiesforgroupsof individualsclusteringobjectsinasimilarway.Themodelisanonconvexproblemanddifficulttosolve exactly,andwethusintroduceaVariableNeighborhoodSearchheuristictoprovidesolutionsefficiently. Computationalexperimentsandanempiricalapplicationtoperceptionofchocolatecandyshowthatthe heuristicalgorithmisefficientandthattheproposedmodelissuitedforrecoveringheterogeneousdata. Implicationsforclusteringresearchersarediscussed.

1. Introduction

Clustering algorithms determine groups of objects in such a way that objects in the same group, called clusters, are more similar to one another than to those in other groups ( Hansen & Jaumard, 1997 ). Clustering is ubiquitous, with applications in the natural sciences, psychology, medicine, engineering, economics, marketing and other ﬁelds (e.g. Frey & Dueck, 2007; Jain, Murty, & Flynn, 1999; McLachlan & David, 2004 ).

Among the many types of clustering models, a popular one is

partitioning a set O₌

{

o1,...,on

}

of n objects into a set of P=

{

C1,...,Cp

}

clusters such that:

(i) C_j₌_∅_,

∀

j_∈

{

1 _,_._._._,p

}

;

(ii) Ci∩ Cj = ∅ ,

∀

i,j ∈

{

1 ,...,p

}

with i = j; and

(iii) ∪p

j=1Cj =O.

The input data for clustering algorithms is often a single matrix X of dimensions n × s, obtained by measuring s features of the objects of O. This matrix is then used to compute a n_{× n} matrix of pairwise dissimilarities D=

(

di j

)

between objects of O, such

that dijfor i,j∈

{

1 ,...,n

}

(usually) satisfy: (i) di j =dji ≥ 0, and (ii)

∗ _{Corresponding author.}

E-mail addresses: santi.everton@gmail.com (É. Santi), aloise@dca.ufrn.br , daniel.aloise@gerad.ca (D. Aloise),sjb247@georgetown.edu (S.J. Blanchard).

dii =0 . Such single D dissimilarity matrix does not need to satisfy

triangle inequalities, i.e., to be distances.

For many problems, only one dissimilarity matrix is available. For instance the Iris dataset ( Fisher, 1936 ), one of the most popular datasets used in cluster analysis, consists of 150 samples from each of three species of Iris flowers where each flower is measured on four characteristics. Using the attributes measured, the flowers are typically cluster in the three (expected) species. The use of clas- sical clustering algorithms (e.g. k-means, single-linkage, complete- linkage) on this dataset can provide excellent results. It is however possible that more than one dissimilarity matrix is available. In the context of the Iris data set, one could envision asking a sample of multiple experts to measure the same flowers, in case there has been significant measurement error. If there is heterogeneity in the data reported, we argue that aggregating the dissimilarity matrices might mask differences truly present in the data.

There are indeed many contexts for which multiple measurements (i.e., dissimilarity matrices) are available. For instance in the social sciences, it is common to ask a sample of individuals to each provide pairwise similarity judgements between brands (e.g., how similar is Coke to Pepsi?). Such tasks, known as pairwise similarity tasks, produce one dissimilarity matrix for each partici- pant, and have been used to study preference formation ( Carpenter & Nakamoto, 1994 ), advertisement similarity ( Schweidel, Bradlow, & Williams, 2006 ), comparing brands ( Bijmolt, Wedel, Pieters, & DeSarbo, 1998 ), store positioning ( Arora, 1982 ), variety seeking http://dx.doi.org/10.1016/j.ejor.2016.03.033

(2)

( Feinberg, Kahn, & McAlister, 1992 ), and substitution decisions ( Hamilton et al., 2014; Ratneshwar & Shocker, 1991 ). The necessity to consider the multiple dissimilarity matrices stems from the fact that measurements often reﬂect differences in perception. Such different dissimilarity matrices can be thought of as reﬂect- ing different points of view ( Brusco & Cradit, 2005; DeSarbo & Car- roll, 1985; DeSarbo, Atalay, LeBaron, & Blanchard, 2008; Lee, 2001; Steinley, Hendrickson, & Brusco, 2015; Vichi, Rocci, & Kiers, 2007 ) which have been incorporated in a large number of perceptual models that include multidimensional scaling, three-way clustering, and mixture models.

Among the various clustering models available, the p-median model has received significant attention across fields (e.g., Sáez- Aguado and Trandafir, 2012 ; Brusco, Steinley, Cradit, and Singh, 2012 ; Avella, Boccia, Salerno, and Vasilyev, 2012) . The p-median model aims to partition objects into clusters such that the sum of the distances from each object to the central exemplar of its cluster (i.e., median) is minimal. Given n objects to be clustered and a known number of clusters p, the mathematical problem can be formulated as an integer linear program ( ReVelle & Swain, 1970 ). In our notation, there is one key set of decision variables that involves the assignment of objects to clusters. First, ej j =1 , which

indicates if object j is chosen as the median of a cluster, and 0 otherwise, for j ∈

{

1 ,...,n

}

. Second, on the off-diagonal elements,

ei j = 1 if object i is assigned to the cluster whose object j is the

median, and 0 otherwise, for i∈

{

1 ,...,n

}

(object j is naturally assigned to itself if it is a median). Using this notation, the p-median model is expressed as follows:

min n i=1 n j=1 di jei j (1)) subject to n j=1 ei j= 1 ,

∀

i ∈

{

1 ,...,n

}

(2) n j=1 ej j= p (3) ei j≤ e j j

∀

i,j ∈

{

1 ,...,n

}

(4) ei j∈

{

0 ,1

}

∀

i,j ∈

{

1 ,...,n

}

. (5)

The constraints (2) require that each object must be assigned to one and only one median. Constraint (3) imposes that the number of medians must be exactly p. The constraints (4) ensure that object i can only be assigned to object j if object j is a median and constraints (5) are domain constraints for the decision variables. Finally, the product of dij and eij in (1) captures the dissimilarity

from each object i to its closest median j.

One of the main characteristics of the p-median is its breadth of applicability. It can be applied to cluster metric data as well as to more general similarity/dissimilarity data, even asymmetric or rectangular data structures (i.e., when not every object can be a median) ( Köhn, Steinley, & Brusco, 2010 ). Mladenovi ´c, Brimberg, Hansen, and Moreno-Prez (2007) present an extensive review of exact and heuristic solution methods for this problem. Despite its advantages, including excellent classiﬁcation rates, robustness to outliers and attractive assumptions, an aggregate p-median formulation may still mask individual heterogeneity as is later shown in the empirical application.

In this paper, we propose a mathematical programming formulation based on the p-median to cluster data collected from

individuals 1 _who_provided_{heterogeneous}_{dissimilarity}_matrices. The model is conceived in two levels. The first identifies clusters of individuals, herein called groups for readability, with similar clustering structures. The second identifies the partitions of objects, for each of these groups.

The remainder of the paper is as follows. In the next section, we present the mathematical formulation and discuss how available exact algorithms can be used to solve our model. In Section 3 , we describe the Variable Neighborhood Search (VNS) ( Hansen & Mladenovi ´c, 2001; Mladenovi ´c & Hansen, 1997 ) heuristic for the model. In Section 4 , we present a Monte Carlo Simulation whereby our results illustrate the necessity for heuristic algorithms. We also show that the proposed VNS heuristic has the ability to predict heterogeneous clustering data. Section 5 provides an empirical example from a local United States retailer about perceptions of chocolate candies. This last section illustrates how the proposed methodology can be used by managers, and help discover insights based on heterogeneous perceptions.

2. Problem formulation

Let m individuals evaluate n objects such that a matrix data

Dk₌

₍

_dk

i j

)

is obtained for k∈

{

1 ,...,m

}

, representing the dissim-

ilarities between pairs of objects i and j as perceived by individual

k, and ck_{, for}_k_∈

_{

₁_,_._._._,_m

_}

_,_{the number of clusters expected by in-}

dividual k. The clustering problem considered in this work involves identifying groups of individuals whose dissimilarity matrices suggest a similar clustering solution. Clusters, for each group of individuals, are organized by means of a medians-based model where each clustered object is associated to the most representative item (i.e., the median) of its cluster. The HeterogeneousClustering Prob-lem (HCP) can be formulated as follows:

min m k=1 G g=1 zkg

n i=1 n j=1 dk i je g i j

(6) subject to n j=1 eg_{i j}= 1

∀

g∈

{

1 ,...,G

}

,

∀

i∈

{

1 ,...,n

}

(7) eg_{i j}≤ e g j j

∀

g∈

{

1 ,...,G

}

,

∀

i,j ∈

{

1 ,...,n

}

(8) G g=1 zkg_{= 1}

_∀

_k_∈

_{

₁_,_._._._,_m

_}

₍₉₎ m k=1 zkg_{≥ 1}

_∀

_g_∈

_{

₁_,_._._._,_G

_}

₍₁₀₎ n j=1 eg_{j j}=

m k=1ckzkg m k=1zkg

∀

g ∈

{

1 ,...,G

}

(11) eg_{i j}∈

{

0 ,1

}

∀

g∈

{

1 ,...,G

}

,

∀

i,j ∈

{

1 ,...,n

}

(12) zkg∈

{

0 ,1

}

∀

g∈

{

1 ,...,G

}

,

∀

k ∈

{

1 ,...,m

}

(13) The m individuals are partitioned into G groups. The decision variables zkg _{express the assignment of individual}_k_{to group}

g. Variables eg_{i j} are equal to 1 if object i is assigned to object

1 We use the term individuals for clarity with respect to our empirical applica-

tion which refers to different consumers. We note such dissimilarity matrices could come from other sources (e.g., ﬁrms, repeated measurement for the same person, etc.), in line with the research on ”points of view” ( Brusco & Cradit, 2005 ).

(3)

j in group g, and e_{i j}g= 0 otherwise. The objective is to mini- mize (6) , i.e., the sum of dissimilarities between each object and its assigned median, conditional on (individual) group membership. Constraints (7) require that each object i be assigned to exactly one median, as part of each group g’s clustering solution. Constraints (8) ensure that object i can only be assigned to object j for group g if object j is a median for that group. Con- straints (9) require that each individual is assigned to exactly one group, whereas constraints (10) guarantee that no empty group ex- ist. Finally, constraints (11) impose the total number of medians for each group g equal to the ﬂoor of the average number of medians expected by the individuals in that group. As argued and shown by Blanchard, Aloise, and DeSarbo (2012a) ; Blanchard, DeSarbo, Atalay, and Harmancioglu (2012b) , this suggestion of limiting the number of clusters to represent consumer perceptions follows numerous researchers in the behavioral literature who have shown that individuals tend to favor simple representations when forming object perceptions and preferences ( Bettman, Luce, & Payne, 1998; Bettman & Park, 1980; Shugan, 1980; Simon, 1955 ). In our empirical application, the data reﬂects the fact that each individual had formed his own partitions. Doing so provided us actual data to justify the use of different number of medians for each group of individuals.

The model in ( 6 )–( 13 ) may be further simpliﬁed. For instance, the optimization process guarantee that n_j₌₁eg_{j j} is an integer value as big as possible given that more medians in a group im- ply lower (or equal) objective function values. Consequently, constraints (11) can be replaced by the following inequalities:

n j=1 eg_{j j}≤ m k=1ckzkg m k=1zkg ,

∀

g∈

{

1 ,...,G

}

, (14)

without affecting the optimal solution. Moreover, these constraints can be modiﬁed if the user prefers that the number of medians in each group equals the closest integer to

m k=1ckzkg

m

k=1zkg instead of the ﬂoor. For that, it would suﬃce to add 0.5 to the right-hand side of constraints (14) .

The HCP is a Mixed-Integer Quadratically Constrained Quadratic Problem (MIQCQP) for which literature concerning exact and heuristic methods is vast (e.g. Anstreicher, 2012; Audet, Hansen, Jaumard, & Savard, 20 0 0; Billionnet, Elloumi, & Lambert, 2016; Bomze & Locatelli, 2004; Galli & Letchford, 2014; Saxena, Bonami, & Lee, 2010; Zheng, Sun, & Li, 2011 ). Particularly for the HCP, all variables are required to be binary such that the problem is a 0–1 QCQP. In the following subsections, we present the methods em- ployed here to solve the HCP exactly.

2.1. Genericsolvers

In our attempts to solve the HCP exactly, we used Couenne ( Belotti, Lee, Liberti, Margot, & Wächter, 2009 ), Baron ( Tawarmalani & Sahinidis, 2005 ), and GloMIQO ( Misener & Floudas, 2013 ). The ﬁrst two are different implementations of the spatial Branch-and- Bound (sBB) algorithm ( Liberti, 2006 ) for nonconvex mixed-integer nonlinear problems (MINLP). Much like a Branch-and-Bound (BB) algorithm for MIPs, sBB explores the feasible space exhaustively but implicitly, ﬁnding a guaranteed

ε

-approximate solutions for any given

ε

> 0 in ﬁnite (potentially exponential) time. Unlike MIPs, whose continuous relaxation is a linear program, and unlike convex MINLPs, whose continuous relaxation is a convex NLP, the continuous relaxation of a nonconvex MINLP is usually diﬃ- cult to solve. To address this issue, sBB algorithms form and solve convex relaxations of the given MINLP. The convexity gap between the original MINLP and its convex relaxation therefore stems from two factors: the relaxation of the integrality constraints, as well

as the relaxation of the nonconvex terms appearing in the MINLP. The third generic solver GloMIQO is a branch-and-cut algorithm based on generating tight convex relaxations from detecting spe- cial structures such as convexity and edge-concavity. The algorithm is specialized to address MIQCQPs to

ε

-optimality.

2.2.Fortet’slinearizationwithRLT-constraints

Linearization can be achieved by means of Fortet’s inequalities ( Fortet, 1960 ), thereby replacing the product of binary variables eg_{i j}× zkg _by_wkg

i j ( w kg

i j ∈[0 ,1] ) for k∈

{

1 ,...,m

}

; i,j∈

{

1 ,...,n

}

; g∈

{

1 ,...,G

}

, along with three additional constraints which together ensure that max

{

0 ,eg_{i j}+zkg_{− 1}

_}

_{≤ w}kg

i j. The three

sets of constraints are: wkg_{i j} ≤ e g i j

∀

g∈

{

1 ,...,G

}

,

∀

k∈

{

1 ,...,m

}

,

∀

i,j ∈

{

1 ,...,n

}

, (15) wkg_{i j} ≤ z kg

_∀

_g_∈

_{

₁_,_._._._,_G

_}

_,

_∀

_k_∈

_{

₁_,_._._._,_m

_}

_,

_∀

_i_,_j_∈

_{

₁_,_._._._,_n

_}

_, (16) wkg_{i j} ≥ e g i j+ z kg_{− 1}

_∀

_g_∈

_{

₁_,_._._._,_G

_}

_,

∀

k∈

{

1 ,...,m

}

,

∀

i,j ∈

{

1 ,...,n

}

. (17) To further accelerate the optimization process of the resulting mixed-integer problem (MIP), we strengthen the formulation by adding constraints (cuts) that do not affect the optimal integer solution. In the spirit of the Reformulation- Technique (RLT) ( Sherali & Adams, 1990 ), we obtain a set of additional cuts by multiplying the n× G constraints in (7) by zkg _and

₍

₁_{− z}kg

₎

_{, for}_k₌₁_,_._._._,_m_,

then replacing the products eg_{i j}× z kg_by_wkg

i j. This yields the follow-

ing constraints: n j=1 wkg_{i j} = zkg,

∀

g∈

{

1 ,...,G

}

,

∀

k∈

{

1 ,...,m

}

,

∀

i ∈

{

1 ,...,n

}

, (18) n j=1

(

eg_{i j}− w kg i j

)

= 1 − z kg

_∀

_g_∈

_{

₁_,_._._._,_G

_}

_,

∀

k ∈

{

1 ,...,m

}

,

∀

i∈

{

1 ,...,n

}

. (19) Notice that constraints (18) make constraints (16) redundant. Without loss of generality, consider particular indices i, k and g associated to one of the constraints (18) . In order to hold the equality, since w is non-negative, each one of the terms wkg_{i j} must be smaller than zkg_{, which implies that constraints}₍₁₆₎_{are redundant.}

Moreover, constraints (17) are redundant due to constraints (19) . For all g∈

{

1 ,...,G

}

,

∀

k∈

{

1 ,...,m

}

,

∀

i,j∈

{

1 ,...,n

}

, we can rewrite (17) as eg_{i j}− w kg

i j ≤ 1 − z

kg_{. Now, let}_{us take,}_{without loss}

of generality, particular indices i, k and g associated to one of the constraints (19) . It follows that the equality holds if and only if

eg_{i j}_{− w}kg_{i j} _{≤ 1}_{− z}kg_,_{since constraints}₍₁₅₎_{guarantee that}_eg i j− w

kg i j ≥

0 . Consequently, constraints (17) are also redundant to the model. The resulting MIP model is denoted

HCP-R1

, and was solved in our computational experiments by the generic MIP solver CPLEX version 12.6.

2.3.Convexiﬁcations

As a 0–1 QCQP, the HCP can also be written in the following form:

min xT_Q0_x₊_c0_x ₍₂₀₎

subject to

(4)

x∈

{

0 ,1

}

s (22) where the Qh _are_symmetric_matrices_of_order_s_,_the_ch _are_s_-

vectors and bh _are_scalars._To_illustrate,_consider_an_illustrative

objective function that is as follows 3 z11_e1

12+3 z11e121+3 z12e112+

3 z12_e1

21+ 4 z21e212+ 4 z21e221+ 4 z22e212+ 4 z22e212 with m = 2 , n = 2

and G=2 . We can express this function in quadratic form with:

x=

⎡

⎢

⎣

e1 12 e1 21 e2 12 e2 21 z11 z12 z21 z22

⎤

⎥

⎦

and Q0=

⎡

⎢

⎣

0 0 0 0 1 .5 1 .5 0 0 0 0 0 0 1 .5 1 .5 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 2 2 1 .5 1 .5 0 0 0 0 0 0 1 .5 1 .5 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 2 2 0 0 0 0

⎤

⎥

⎦

We performed the convexiﬁcation of (6) and (14) with two different methods:

(i) the method proposed by Hammer and Rubin (1970) , which makes matrix Q semi-deﬁnite positive by subtracting its minimum eigenvalue from the diagonal entries and by ad- justing the linear term of the expression, and

(ii) the method that convexiﬁes each product C_{· x}_{· y} with C_>

0, by replacing it by the difference of two convex functions:

1

2

(

x+y

)

2− 1 2

(

x+y

)

.

Both reformulations preserve the cost of every feasible binary solution. In our example, the minimum eigenvalue of Q0 _is_−4.

Thus, the objective function is replaced by (i) with the convex function f

(

x

)

= f

(

x

)

+4

((

e1

12

)

2+

(

e121

)

2+

(

e212

)

2+

(

e221

)

2+

(

z11

₎

2₊

₍

_z12

₎

2₊

₍

_z21

₎

2₊

₍

_z22

₎

2

₎

_{− 4}

₍

_e1

12+e121+e122 +e221+z11+ z12₊_z21₊_z22

₎

_._In_{reformulation}_(ii),_the_products_of_the_z _and e variables are replaced by the difference of convex functions. For instance, the product 3 z11_e1

12 in the objective function of our

example above is replaced by 3

2

(

z11+ e112

)

2−32

(

z11+ e112

)

.

The resulting convex 0–1 QCQP formulation of the HCP with (i) was denoted

HCP-R2

whereas that with (ii) was denoted

HCP-R3

. They are solved by CPLEX which automatically converts a 0–1 convex QCQP formulation into a 0–1 second order cone program for which relaxations are solved via the barrier algorithm.

3. VNS heuristic for the HCP

VNS is a metaheuristic developed to solve combinatorial and global optimization problems by changing neighborhoods in its local descent step for intensification as well as in its shaking step for diversification (see Hansen, Mladenovi ć, and Pérez, 2010 , for a survey).

VNS relies on the following three observations:

Observation 1: A local minimumwith respect to one neighbor-hoodstructureisnotnecessarysoforanother;

Observation 2: A global minimum is a localminimum with re-specttoallpossibleneighborhoodstructures;

Observation 3: Localminimawithrespecttooneorseveral neigh-borhoodsareoftenrelativelyclosetooneanother.

In the VNS framework, the neighborhoods are deﬁned around types of moves, or perturbations, of the best current solution x – the center of the search. When looking for a better one in a mini- mization problem, a solution xis drawn at random in an increas- ingly wider neighborhood, and a local descent is performed from x

leading to another local optimum x. If xis worse than x, then x

is ignored and one chooses a new neighbor solution xin a more distant neighborhood of x. If instead xis better than x, the search is re-centered around x and the local search restarts in the closest neighborhood of the newly found best current solution. Once

all neighborhoods of x have been explored without success, one begins again with the closest one to x, until a stopping condition (e.g. maximum CPU time) is met.

As the size of neighborhoods tends to increase with their distance from the current best solution x, close-by neighborhoods are explored more thoroughly than far away ones. This strategy takes advantage of the three observations 1–3 mentioned above, and yet can ensure with suﬃcient computational time that the algorithm is not stuck in a poor local optimum. We now turn to our implementation of VNS for the HCP.

3.1. Initialization

VNS requires an initial solution which can be either provided or constructed by the user. Algorithm 1 presents the pseudocode of our approach to construct an initial solution.

Algorithm 1 VNS: Constructive Heuristic (CH).

Compute a matrix F =

(

fab

₎

_with_dimension_m_{× m such}_that

fab_{is the Frobenius norm of}_Da_{− D}b_;

Solve a p-median problem with p₌G medians using matrix _F as input;

for k= 1 ,...,m do

set zkg₌₁_if_{the dissimilarity matrix}_of_individual_k_is_as-

signed to the gth median, zkg₌_{0 otherwise;}

end for

for g=1 ,...,G do

solve subproblems Mg

(

z

)

;

end for

Algorithm 1 ﬁrst solves the problem of assigning individuals to groups, and does so by using a distance matrix of m× m individuals based on the Frobenius norm of each individual’s distance matrix between objects. Once the p-median is applied to this distance matrix between individuals, they are assigned to groups according to the partition obtained, i.e., if a pair of individuals have their distance matrices assigned to the same cluster in the p-median model then these individuals are assigned to the same group in the initial solution. Then, for each initial group, solving subproblems Mg( z),

for g₌1 _,_._._._,G provides a complete initial solution for HCP: Mg

(

z

)

= min n i=1 n j=1 dgi jegi j (23) subject to n j=1 eg_{i j}= 1

∀

j ∈

{

1 ,...,n

}

(24) eg_{i j}≤ e g j j

∀

i,j ∈

{

1 ,...,n

}

(25) n j=1 eg_{j j}=

g

₍₂₆₎ eg_{i j}∈

{

0 ,1

}

∀

i,j∈

{

1 ,...,n

}

, (27) where dgi j =mk=1di jkzkg, and

g= m k=1ckzkg m

k=1zkg . We note that problem ( 23 )–( 27 ) corresponds to the p-median problem ( 1 –5 ). Fur- thermore, this constructive heuristic could be easily replaced by others, such that other distance norms (e.g., L1, L∞) could be sub-

stituted just as the assignment of objects to clusters could be done via any other partitioning heuristic. For our constructive heuristic, we used the approach by Hansen and Mladenovi ´c (1997) as it en- sures that each group contains at least one individual.

(5)

3.2. Shaking

The shaking component of our VNS is implemented by means of random moves in the swap neighborhood which encompasses all the possible ways of removing an individual from a group and adding it to a different one. Thus, if the parameter t=2 for shaking, then two random swap moves are performed for two individuals; if t = 3 , then three swap moves are performed for three individuals, and so on.

3.3. Localsearch

Given an existing solution, we need to search the neighborhood to reach a local optima. We developed our local search following the Variable Neighborhood Descent (VND) framework, which generalizes the observations 1–3 to descent methods. Algorithm 2 presents a general VND’s algorithmic steps.

Algorithm 2 Local Search: VND Framework.

Input: a solution x and a set of descent methods descent s, for

s_∈

{

1 _,_._._._,smax

}

s←smin

repeat

x_← descent s

(

x

)

;

If x = xmake x← xand s← smin; otherwise s← s+ 1 ;

until s>smax

Applied to HCP, VND involves the iterative optimization of the objective function via improvements based on three descent methods: (1) descent on the clustering of objects (conditional on group memberships and number of medians), (2) descent on the group memberships (conditional on objects clusterings and number of medians), and (3) descent by (perhaps) augmenting the number of medians. VND (the local search) ends when all descent methods have been consecutively explored without any improvement in the objective function. Whenever an improvement occurs, the algorithm resets s to s_min. In the present section, we present each of our descent procedures.

3.3.1. Firstdescent

The descent method descent 1 for Algorithm 2 solves subprob-

lem (23)–(27) for each group affected by the shaking procedure. Namely, for each group the descent method identifies the conditionally optimal clustering of objects assuming that both the number of medians and the group memberships are known. Our choice has been to perform this descent by heuristics (e.g., Hansen & Mladenovi ć, 1997; Resende & Werneck, 2004; Hansen, Brimberg, Uroševi ć, & Mladenovi ć, 2009 ) to accelerate the whole algorithm.

3.3.2. Seconddescent

The second descent, descent 2, temporarily assumes the cluster-

ing of objects to be known in all groups (i.e. variables e). Then, it descents by conditionally reassigning individuals to the groups that provide the best values for z. To do so, the following binary program is solved: W

(

e

)

= min m k=1 G g=1 zkg_d_˜kg ₍₂₈₎ subject to m k=1ckzkg m k=1zkg ≥

ω

g

_∀

_g_∈

_{

₁_,_._._._,_G

_}

₍₂₉₎ G g=1 zkg_{= 1}

_∀

_k_∈

_{

₁_,_._._._,_m

_}

₍₃₀₎ zkg_∈

_{

₀_,₁

_}

_∀

_g_∈

_{

₁_,_._._._,_G

_}

_,

_∀

_k_∈

_{

₁_,_._._._,_n

_}

₍₃₁₎ where d˜ kg₌n i=1nj=1dki je g i j, and

ω

g= n j=1egj j. Problem ( 28 )–

( 31 ) is a binary program which is usually solved at the root node of the branch-and-cut algorithm implemented by CPLEX according to our limited computational experiments. We chose to halt the second descent after root node solution. If the W( e) is not solved to optimality, the previous solution is kept.

3.3.3. Thirddescent

It is trivial that the objective function of a clustering algorithm is improved when one allows the clustering solution to have a greater number of medians. However in our case, the HCP re- strains the number of medians in each group g, for g_∈

{

1 _,_._._._,G

}

_,

by means of constraints (14) , and thus pushes that number to the largest integer smaller or equal to

m k=1ckzkg

m k=1zkg

. Speciﬁcally for the HCP, increasing the number of medians for a group-level clustering solution will not necessarily always improve the objective function. The third descent thus aims at seeing if the number of medians in a group g∗ can be augmented by reallocating objects to a new median, thereby satisfying constraints (14) to the new number of medians.

Algorithm 3 details how this procedure works. Speciﬁcally, for each group g∗

(

g∗∈

{

1 ,...,G

}

)

we initialize a solution ( zbest_,_ebest₎

with the values of the best current solution for the HCP, i.e., ( zbest_,

ebest₎_←₍_z_,_e_{). Then, the problem}_M

g∗

(

z

)

is solved with

g∗ re-

placed by

g∗₊₁_,_{thereby producing a new partition for the ob-}

jects in g∗ for which the number of medians is increased by one unit. We then solve the problem W( e) and verify if this replace- ment of

g∗ _by

g∗₊₁_,_{can be accommodated by reassigning the}

individuals among the groups. If W( e) is infeasible or if the cost of the new yielded solution is larger than the best solution found, the solution of W( e) is ignored. Otherwise, the solution of W( e) be- comes the new best incumbent solution.

Algorithm 3 descent 3.

Input: a solution

(

z_,e

)

for HCP

(

zbest_,_ebest

₎

_←

₍

_z_,_e

₎

for g∗= 1 ,...,G do

Solve Mg∗

(

z

)

with

g∗ replaced by

g∗+1 ;

Solve W

(

e

)

;

if W

(

e

)

is infeasible or cost of

(

z,e

)

greater than the cost of

(

zbest_,_ebest

₎

_then

(

z,e

)

←

(

zbest_,_ebest

₎

_;

else

(

zbest_,_ebest

₎

_←

₍

_z_,_e

₎

_;

end if end for

return

(

zbest_,_ebest

₎

A critical element in VND heuristics is the order in which the descent methods should be explored (e.g., in Algorithm 2 ). In our implementation, this decision was based on several observations. First, descent _ 3 is more computationally expensive than the other two and as such, it was set to be the last. Second, because our shaking step involves perturbing only the group membership variables, applying the shaking step directly after descent _ 2 (about group memberships) would revert the shaking step 2_{. Thus,}

2 To illustrate, suppose a local minimum ( z _{, e}_{) with respect to the ﬁrst and sec-}

ond descent. The shaking step applied to ( z _{, e}_{) generates a new solution ( z}_{, e}_),

perturbing only the group membership variables. Thus, if the second descent is applied just after the shaking step it will change back, yielding the same solution ( z _, e ).

(6)

Table 1

Monte-Carlo Simulation: experimental design and dataset characteristics.

Instance Individuals Groups Objects Medians Perturbation Perturbation

m G n dissimilarities medians 1 150 10 30 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 2 300 2 18 All 6 N(0, 0.1) 0 3 450 2 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 4 150 2 18 All 3 N(0, 0.05) N(0, 0.5) 5 450 10 18 All 6 N(0, 0.05) N(0, 1) 6 150 10 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 7 300 2 18 All 6 0 N(0, 0.5) 8 150 10 18 50 percent 3, 50 percent 6 0 N(0, 1) 9 300 10 30 All 3 N(0, 0.05) N(0, 0.5) 10 450 6 18 All 3 N(0, 0.1) N(0, 1) 11 150 6 30 All 6 N(0, 0.1) 0 12 300 10 18 All 3 0 0 13 450 10 18 All 6 N(0, 0.1) 0 14 300 6 18 50 percent 3, 50 percent 6 0 N(0, 1) 15 300 2 30 All 6 N(0, 0.05) N(0, 1) 16 450 2 30 50 percent 3, 50 percent 6 0 N(0, 1) 17 300 6 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 18 300 6 30 50 percent 3, 50 percent 6 N(0, 0.05) 0 19 150 6 18 All 6 0 N(0, 0.5) 20 450 6 30 All 3 0 0 21 150 2 30 All 3 N(0, 0.1) N(0, 1) 22 450 2 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 23 450 6 18 All 3 N(0, 0.05) N(0, 0.5) 24 300 10 18 All 3 N(0, 0.1) N(0, 1) 25 150 6 18 All 6 N(0, 0.05) N(0, 1) 26 150 2 18 All 3 0 0 27 450 10 30 All 6 0 N(0, 0.5)

we used descent _ 1 as the ﬁrst descent, followed by descent _ 2 and descent _ 3 .

4. Monte-Carlo simulation: performance and robustness In the previous section, we have introduced a new model, a series of reformulations, and a heuristic to solve for the HCP. In the present section, we tackle the relative performance of the approaches. Specifically, we first generate a set of datasets for which the true solution is known. Second, we attempt to solve the HCP for each dataset via exact methods and our proposed heuristic, showing that the use of a heuristic is necessary because exact methods cannot be used for problems of moderate size. Third, we compare the performance of the proposed heuristic to that of a benchmark heuristic based on a related model and show that the performance of our heuristic is significantly better. Fourth, we investigate the circumstances under which the proposed heuristic for HCP is likely to outperform competing alternatives.

4.1.Datasetsgeneration

To do so without providing any algorithm an unfair advantage, we needed a set of problems with known data generating mech- anisms for Dk_{, and}_ck_{. As such, we simulated data following the}

fractional factorial experimental design used by Blanchard et al. (2012a) . The process involves generating 27 simulated datasets (i.e., experimental trials) that have known solutions and that can be used to study the impact of different dataset characteristics on the ability of the competing algorithms to perform. The factorial design appears in Table 1 .

The characteristics of the datasets (experimental factors) included the total number of individuals ( m ₌ 150, 300, 450), the number of groups ( G = 2, 6, 10), the number of objects ( n = 18, 30), the variance in the number of medians across groups, the amount of error added to the dissimilarity matrix of each individual (using N(0, 0.05) or N(0, 0.1) before rounding), and the amount of error added to the number of medians sought by each individual

(using N(0, 0.5) or N(0, 1) before rounding). Further, following the works of Blanchard et al. (2012a) , Blanchard and DeSarbo (2013) , Brusco and Cradit (2001) and others, we also assume that the true number of groups G is known - but not their composition. 3

4.2. Comparisonstoexactsolvers:optimizationresults

In the present section, we wish to establish the necessity of in- troducing a heuristic for HCP via comparing its results to the following exact solvers:

(a) Couenne version 0.5.3, Baron version 15.2.0 and GloMIQO version 2 on formulation (6) - (14) ,

(b) CPLEX version 12.6 on

HCP-R1

,

HCP-R2

and

HCP-R3

, and (c) the VNS heuristic presented in the last section.

To compare the performance of the generic MINLP solvers, CPLEX, and the VNS heuristic, we use each to solve the 27 simulated instances presented in Table 1 . Computational experiments were performed on a Xeon(R) CPU X5650 2.67 gigahertz and 64 gigabytes of RAM memory. Couenne, Baron, GloMIQO and CPLEX were allowed to run for 24 hours with default parameters. The VNS heuristic was allowed to run for 600 seconds. The algorithm was implemented in C++ and compiled by gcc 4.4.

4.2.1. Performanceofexactalgorithms

As most of the instances could not be solved to optimality, Table 2 reports the best upper bounds ( ub), lower bounds ( lb) and number of explored branch-and-bound nodes ( # bbn) as obtained by each solver. The upper bounds correspond to feasible solutions obtained by different heuristics used within each solver. They are important in branch-and-bound algorithms to eliminate branches of the enumeration tree that do not lead to the optimal solution. Better upper bound values are usually able to cut more of these branches, which improves the overall performance of branch-and- bound methods. An empty value in column ( ub) indicates that no

(7)

Table 2

Bound values obtained by the solvers for the 27 instances generated for Monte-Carlo Simulation.

Instance Couenne Baron GloMIQO CPLEX

ub lb # bbn ub lb # bbn ub lb # bbn HCP-R1 HCP-R2 HCP-R3 ub lb # bbn ub lb # bbn ub lb # bbn 1 0 0 0 0 0 0 2707 .04 1 0 409 3979 .78 0 391 2 0 0 2388 .25 8 .85 1 0 0 2392 .48 2311 .75 613 3047 .60 0 3591 3146 .26 0 5310 3 0 0 4604 .21 0 1 0 0 5064 .68 4364 .75 252 5633 .01 0 1534 5763 .45 0 2766 4 0 0 2387 .97 0 4 0 0 1870 .91 1819 .56 2041 1992 .85 0 13948 1972 .79 0 13032 5 0 0 0 0 0 0 3346 .33 1 5583 .94 0 293 7063 .65 0 109 6 0 0 0 0 0 0 2437 .02 1382 .45 23 2028 .00 0 3200 2166 .37 0 1182 7 0 0 3698 .33 0 1 0 0 2652 .68 2410 .68 419 3410 .67 0 4269 3315 .00 0 3817 8 0 0 0 0 0 0 1470 .84 1 2081 .83 0 3982 2123 .50 0 1313 9 0 0 0 0 0 0 6646 .22 1 0 163 8008 .79 0 103 10 0 0 0 0 0 0 4773 .51 1 0 611 7020 .89 0 470 11 0 0 0 0 0 0 2516 .03 1 0 800 3823 .25 0 666 12 0 0 0 0 0 0 3749 .99 1 0 840 4327 .33 0 332 13 0 0 0 0 0 0 3067 .90 1 0 269 6314 .56 0 161 14 0 0 0 0 0 0 2905 .84 1 4 4 46 .83 0 553 4475 .67 0 720 15 8308 .26 0 2 0 0 0 0 6126 .58 5642 .20 1 6860 .69 0 1514 6811 .36 0 1609 16 12667 .50 0 1 0 0 0 0 9668 .71 1 11937 .90 0 282 0 458 17 0 0 0 0 0 0 4883 .34 2669 .76 6 3869 .98 0 1653 4386 .82 0 857 18 0 0 0 0 0 0 5993 .60 1 0 217 7906 .57 0 295 19 0 0 1714 .33 0 1 0 0 1190 .01 1 1724 .33 0 3695 1715 .33 0 3480 20 0 0 0 0 0 0 10935 .00 1 0 95 12279 .10 0 84 21 3593 .26 0 86 244985 .95 0 1 0 0 4218 .56 3405 .42 5 3795 .42 0 3669 3797 .77 0 5907 22 7065 .28 0 4 254434 .50 0 1 0 0 4729 .62 4231 .32 300 5626 .56 0 1808 5965 .84 0 1726 23 0 0 0 0 0 0 5206 .48 1 0 643 7145 .39 0 262 24 0 0 0 0 0 0 3167 .10 1 0 640 4454 .83 0 203 25 0 0 259742 .09 0 1 1136 .85 1 1354 .82 1142 .62 1338 1777 .15 0 3603 1770 .16 0 2011 26 2023 .33 0 833 2037 .32 24 .99 1 1874 .99 1874 .99 1 1874 .99 1874 .99 1 2002 .49 0 20016 1895 .16 0 110944 27 0 0 0 0 0 0 8657 .60 1 0 100 0 35

feasible solution was reported by the solver before the time limit was attained for that instance.

The results in Table 2 reveal that instance #26 is the only one that could be solved to optimality by our exact approaches. It has the easiest problem characteristics of all our datasets, with

m= 150 ,n= 18 ,G= 2 and without any kind of perturbation (error) added. Yet, solver GloMIQO spent 12510 seconds to solve the problem whereas CPLEX on

HCP-R1

took 374 seconds. For instance #27 , none of the solvers were able to ﬁnd a feasible upper bound solution within 24 hours of CPU time.

Across all the datasets, the lower bounds obtained by the solvers are very often equal to the trivial one (zero). The only ex- ception is those obtained by CPLEX for the RLT-linearized formulation (

HCP-R1

), which is still quite diﬃcult as revealed by the number of branch-and-bound nodes solved by CPLEX within 24 hours. In 18 of the 27 instances ( ≈ 67 percent ), CPLEX was able to solve only the root node.

It does indeed seem that the reformulations presented in Sections 2.2 and 2.3 helped CPLEX to tackle the problem as it performed better than Couenne, Baron and GloMIQO applied to the original HCP formulation. The lower bounds obtained by the latter are never better than those obtained by CPLEX on

HCP-R1

. Re- garding upper bounds, Baron and Couenne together found only 4 times the best solutions (in instances #2 , #3, #19 and #21).

Among the convexiﬁed formulations,

HCP-R2

and

HCP-R3

do not allow CPLEX to obtain better lower bounds than those obtained by CPLEX on

HCP-R1

despite exploring more branch-and- bound nodes. However, we note that the upper bounds obtained on these two formulations (

HCP-R2

and

HCP-R3

) were better than those obtained for

HCP-R1

in 18 out of 27 instances, i.e., ≈ 67percent .

4.2.2. PerformanceoftheVNSheuristic

Table 3 presents the results obtained by Algorithm 1 (CH) and the VNS heuristic. The second column reports the cost of the initial

Table 3

Summary of VNS results for the 27 instances generated for Monte Carlo Simulation.

Instance CH VNS

Average std Improv. best GAP best ub dev ub (percent) lb (percent)

1 3257 .668 3211 .298 6 .34 19 .310 15 .392 2 2388 .253 2388 .253 0 .00 0 .0 0 0 3 .140 3 4604 .207 4604 .207 0 .00 0 .0 0 0 5 .199 4 1996 .796 1869 .924 0 .00 0 .053 2 .660 5 3936 .212 3792 .574 42 .55 32 .081 11 .356 6 1525 .197 1525 .197 0 .00 24 .793 9 .346 7 2900 .010 2651 .011 0 .00 0 .013 9 .066 8 1785 .667 1598 .416 27 .76 23 .221 7 .982 9 7604 .898 7391 .292 17 .86 7 .710 10 .034 10 6049 .086 5701 .861 29 .07 18 .787 15 .158 11 2835 .408 2835 .408 0 .00 25 .838 9 .588 12 3749 .985 3749 .985 0 .00 13 .342 0 .0 0 0 13 3562 .485 3562 .485 0 .00 43 .583 13 .570 14 3249 .999 3075 .149 4 .99 30 .846 5 .506 15 60 0 0 .374 5760 .642 0 .00 5 .973 2 .056 16 10192 .500 9869 .400 0 .00 16 .608 2 .033 17 3441 .239 3136 .460 22 .44 18 .954 14 .880 18 6497 .311 6497 .311 0 .00 17 .824 7 .618 19 1241 .673 1206 .006 0 .00 29 .652 1 .327 20 10935 .0 0 0 10935 .0 0 0 0 .00 10 .946 0 .0 0 0 21 3709 .921 3593 .040 0 .00 0 .006 5 .222 22 4929 .876 4610 .666 0 .00 2 .515 8 .228 23 5987 .091 5685 .854 32 .20 20 .426 7 .505 24 3945 .636 3764 .599 12 .40 15 .494 15 .724 25 1389 .303 1253 .215 13 .17 7 .499 8 .825 26 1874 .993 1874 .993 0 .00 0 .0 0 0 0 .0 0 0 27 9207 .0 0 0 9004 .640 39 .75 29 .041 3 .854

solution provided by the constructive heuristic. The third column presents the average upper bound solutions obtained in 10 distinct executions of the heuristic, whereas the fourth column shows their associated standard deviation. The ﬁfth column refers to the relative difference between the VNS solutions and the best upper

(8)

Table 4

Average improvements yielded by the incremental application of the descents within the VND framework.

Instance ivns1 ivns1+2 ivns1+2+3 1 0 .841 0 .497 0 .090 4 6 .344 0 .011 0 .0 0 0 5 1 .272 2 .373 0 .036 7 8 .583 0 .004 0 .0 0 0 8 10 .098 0 .339 0 .093 9 1 .083 0 .791 0 .961 10 3 .337 1 .401 1 .100 14 3 .122 0 .584 1 .757 15 3 .611 0 .399 0 .0 0 0 16 0 .696 0 .012 2 .480 17 5 .389 0 .699 2 .987 19 1 .460 0 .0 0 0 0 .0 0 0 21 2 .912 0 .246 0 .0 0 0 22 2 .758 0 .466 3 .372 23 2 .340 2 .189 0 .579 24 2 .375 2 .469 −0 .207 25 5 .500 2 .882 1 .713 27 0 .856 1 .363 −0 .009

bound values presented in Table 2 . Finally, the sixth column refers to the relative difference between the solutions obtained by VNS and the best lower bound values of Table 2 .

Comparing to the results obtained for the exact solvers, we ﬁnd that VNS always obtained better upper bound solutions. The only exceptions are for instances #2 _,#3 and #26 _, when all algorithms obtain the same objective function value. The superior performance of VNS attains its maximum for instance #13 with a difference of 43.58 percent in solution quality.

Our results suggest that the proposed VNS algorithm is sta- ble, as demonstrated by the very small standard deviations, which were inﬂated by a few larger instances. For instance, the largest variability in objective function values for the 10 distinct VNS executions is found for instance #5 , whose data contains perturbations not only in dissimilarity values and in the number of medians sought by each individual, but also had a large number of groups to solve for (i.e., G=10 ).

Contrasting constructive heuristic to the entire VNS algorithm, we note that the constructive heuristic provided a solution which could not be improved by VNS in 9 out of 27 instances (i.e. ≈ 33 percent ). It is interesting to note that the initial solution provided by the constructive heuristic, in 20 out of 27 instances, led to better upper bound solutions than those obtained by CPLEX in 24 hours.

Our VNS algorithm is composed of three descent steps in its local search step. To contrast their effectiveness, we conducted a series of experiments in which they are used in an incremental way. Its results are summarized in Table 4 . The ﬁrst column of the table refers to the instances for which VNS improves the solutions provided by the constructive heuristic. The other columns refer to the relative differences in cost obtained by the application of the successive descents. Thus, i_vns1reports the average improve-

ments (in percent) obtained by VNS with respect to the solutions provided by the constructive heuristic using only the ﬁrst descent as local search. Let costCH be the cost of the solution obtained by

the constructive heuristic, then i_vns1 is calculated as costCHcost−costCHvns1,

where cost_vns1 is the average cost obtained by VNS using only the

ﬁrst descent. Similarly, i_vns1+2 reports the average improvements

(in percent) obtained by VNS when the second descent is added to the VND framework, and ﬁnally, i_vns1+2+3 when the third descent

is included to complete our algorithm. All average results in the table are calculated for 10 runs of the VNS heuristic with a time limit of 600 seconds.

We notice from Table 4 that the gains incurred by the incremental use of the proposed descents are non-increasing in average: ≈ 2.83 percent with only the ﬁrst descent, ≈ 0.58 percent with the ﬁrst and second descents, and _{≈ 0}_.09 percent with all three. However, we note that in some cases (instances #16 and #22) the largest gains are obtained only after the third descent is used. The negative values for Instances #24 and #27 are due to the smaller number of VNS iterations within the established time limit when the third descent is applied. Nevertheless, its incorporation within VND improved the average solution values in 11 out of 18 instances.

4.3. Performancecomparisonwithbenchmarkmodels

In the experimental datasets used in our comparisons, we not only know the objective function value at the global optimum but also the true assignments of individuals to groups and, for each group, the true clustering solutions. As such, we can also investigate the ability of the algorithms to recover these decision variables. In this subsection, we compare the classiﬁcation provided by the heuristics for the HCP with that provided by using the heuristic for the heterogeneousp-medianproblem (HPM) ( Blanchard et al., 2012a ).

The two clustering models, and the resulting VNS heuristics, share many similarities. For starters, both models aim to group individuals based on the clustering solutions that can be obtained from their perceptions of a set of objects. However, the models are distinct in that: (i) whereas the number of medians in a group is conditioned to individuals membership in HCP, it is a variable in HPM; (ii) HPM is a multi-objective model converted to a single- objective model; it weights the sum of dissimilarities between each object and its assigned cluster, conditional on group membership, and the difference between the number of medians of each individual and the estimated number of medians. This second observation is critical because HPM requires the user of the proposed algorithm to select the value for the weight parameter. Our comparison contrasts the results obtained by the VNS heuristic of Section 3 and the VNS heuristic of Blanchard et al. (2012a) for the HPM using the same aforementioned computational platform. The weight used in HPM was set to the average of all dissimilarity values, weighting both roughly equally as was done in Blanchard et al. (2012a) .

Given that not only the algorithms but the models also are distinct, to compare the resulting approaches we need a performance metric that does not systematically favor one over the other. As such to examine the models’ (and the algorithms) ability to recover the original data of Table 1 , we use the Adjusted Rand In- dex (ARI; Hubert and Arabie, 1985) . Doing so allows us to compare the true clustering used in generating the simulated data with the predicted clustering variables e, as well as the individuals true partition in comparison with the predicted assignment variables z. We believe that this comparison is fair given that the same VNS heuristic framework was used, that both heuristics were demonstrated to have good performance regarding the optimization of their respective models, and that the results were collected using the same computational platform.

The results in the last four columns of Table 5 indicate the ARI index with respect to the recovery of the objects clustering (column e) and the groups (column z). To calculate these measures, we obtained ARI separately for each individual before averaging the results across all the individuals. For both heuristics, we allowed 600 seconds of computational time.

Our results suggest that, on average, both algorithms performed very well with an average ARI of .952 for HCP and .801 for HPM. However, the results from a paired sample t-tests suggests that the within-dataset difference between the algorithms for HCP

(9)

Table 5

Monte Carlo Simulation: recovery results.

Instance Individuals Groups Objects Medians Perturbation HCP HPM

Dissimilarities Medians ARI ARI

z e z e 1 150 10 30 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 0 .886 0 .903 0 .958 0 .713 2 300 2 18 All 6 N(0, 0.1) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 3 450 2 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 4 150 2 18 All 3 N(0, 0.05) N(0, 0.5) 0 .947 0 .988 0 .947 0 .616 5 450 10 18 All 6 N(0, 0.05) N(0, 1) 0 .928 0 .911 0 .985 0 .892 6 150 10 18 50 percent 3, 50 percent 6 N(0, 0.05) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 7 300 2 18 All 6 0 N(0, 0.5) 0 .987 0 .881 0 .987 1 .0 0 0 8 150 10 18 50 percent 3, 50 percent 6 0 N(0, 1) 0 .166 0 .979 0 .167 0 .182 9 300 10 30 All 3 N(0, 0.05) N(0, 0.5) 0 .700 0 .787 0 .852 0 .727 10 450 6 18 All 3 N(0, 0.1) N(0, 1) 0 .771 0 .844 0 .783 0 .267 11 150 6 30 All 6 N(0, 0.1) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 12 300 10 18 All 3 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 13 450 10 18 All 6 N(0, 0.1) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 14 300 6 18 50 percent 3, 50 percent 6 0 N(0, 1) 0 .960 0 .985 0 .878 0 .657 15 300 2 30 All 6 N(0, 0.05) N(0, 1) 0 .934 0 .985 0 .871 0 .716 16 450 2 30 50 percent 3, 50 percent 6 0 N(0, 1) 0 .879 0 .969 0 .293 0 .846 17 300 6 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 0 .900 0 .958 0 .711 0 .818 18 300 6 30 50 percent 3, 50 percent 6 N(0, 0.05) 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 19 150 6 18 All 6 0 N(0, 0.5) 0 .968 0 .987 0 .947 1 .0 0 0 20 450 6 30 All 3 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 21 150 2 30 All 3 N(0, 0.1) N(0, 1) 0 .821 0 .955 0 .797 0 .200 22 450 2 18 50 percent 3, 50 percent 6 N(0, 0.1) N(0, 0.5) 0 .956 0 .990 0 .247 1 .0 0 0 23 450 6 18 All 3 N(0, 0.05) N(0, 0.5) 0 .783 0 .882 0 .860 0 .668 24 300 10 18 All 3 N(0, 0.1) N(0, 1) 0 .751 0 .862 0 .879 0 .417 25 150 6 18 All 6 N(0, 0.05) N(0, 1) 0 .890 0 .913 0 .868 0 .897 26 150 2 18 All 3 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 1 .0 0 0 27 450 10 30 All 6 0 N(0, 0.5) 0 .889 0 .926 0 .946 1 .0 0 0 ( M=.952 ,SD=.059 ) and HPM ( M=.801 ,SD=.263 ) is signiﬁcant ( t

(

26

)

₌ 3 _.20 _,p_<_.01 ) 4_{. In fact, excluding the 9/27 datasets where} both algorithms perfectly recovered the original data, the one for HCP outperformed the one for HPM in 14 out of 18 trials. With respect to the recovery of the assignments of individuals to groups, both algorithms also performed very well. Namely whereas the heuristic for HCP obtains an average ARI of .893 ( M=.893 , SD=

.169 ), the heuristic for HPM obtains .851 ( M=.851 , SD=.236 ). The within-dataset difference between the two is not signiﬁcant ( t

(

26

)

= 1 .18 ,p=.25 ).

4.4. Sensitivitytodatasetcharacteristics

What affects the algorithms’ ability to recover the original data? To investigate this critical question, we used multiple linear regression to predict each algorithm’s ARI using dummy-coded factors for each of the data characteristics. The results are displayed in Table 6 for the objects clustering variables and in Table 7 for the individuals grouping variables. In both tables, the rows indicate the data structures characteristics that were manipulated in the 27 generated instances. Each row contains the main effects (beta coefficients) for the factors used as independent variables, along with the significance of the factor. Finding a significant regression coef- ficient suggests that the algorithm is sensitive to the data charac- teristic. As few significant coefficients, as possible, is desired for an algorithm to be robust.

With respect to HPM, we find that the algorithm has some sen- sitivity to changes in data structures. Specifically whereas the algorithm is unaffected by the number of individuals or the number of clusters, partitions with numerous clusters of equal sizes are better recovered than fewer clusters or those with uneven sizes. We also find that error (even in small amounts) to both dissimilarities and to number of medians significantly affects performance.

4 t refers to the t-statistic, and p refers to the associated p-value.

The algorithm for HCP, in contrast, is mostly unaffected by data structures. Of note, it is particularly insensitive to errors added to the distances. It is also better able to recover clustering structures with a larger number of objects, and datasets when the number of groups is smaller than 10. That said, the impact of these factors is minimal as the mean ARI is .95 – a near perfect recovery of the original pairwise data.

With respect to assignments of individuals to groups, we find that error added to the number of medians is a significant pre- dictor, and that it is also sensitive to datasets where the number of groups is a few (but large) clusters. The algorithm for HCP is mostly unaffected when it comes to recovering group memberships. It has marginally more difficulty recovering large group memberships (when G=10 ) and is only impacted by large error added to the number of medians.

5. Empirical illustration: understanding differences in perceptions of assortments of chocolate candy

To further demonstrate the usefulness and performance of the proposed procedure, we collected data for a real-world application and used the proposed VNS heuristic to illustrate heterogeneity in the clustering performed by different individuals. The sorting task (also known as card sorting) asks participants to allocate a set of objects into piles according to their own perception. It is common to instruct participants to (1) put objects into the same pile if they are similar in some way (there are no pre-determined labels), and (2) use as many piles as they desire (c.f., Blanchard & Banerji, 2016 ), and the result is a set of participants who performed their own “partitions” over the set of objects.

The pairwise similarity data provided by the sorting task is, in the most simpliﬁed way, yk

i j=1 if individual k ( k ∈ {1, … , m})

places objects i and j ( i, j ∈ {1, … , n}) in the same pile (high similarity), 0 otherwise (low similarity). The task is particularly suited for the generation of such pairwise similarity data because

(10)

Table 6

Monte Carlo Simulation: factors inﬂuencing the clustering of objects by the individuals.

Factor HCP HPM

Coeﬃcient t-value p-value Coeﬃcient t-value p-value

Intercept 1 .084 37 .521 .0 0 0 ∗∗∗ _{1 .081} _{37 .826} _{.0 0 0}∗∗∗ individuals 300 −.030 −1 .426 .174 −.023 −1 .134 .275 (default: 150) 450 −.023 −1 .085 .295 .0 0 0 .005 .996 groups 6 −.022 −1 .067 .303 −.028 −1 .378 .188 (default: 2) 10 −.045 −2 .136 .050 ∗∗ _−.008 _−.387 _.704 Number of Objects 30 −.007 −.404 .692 .001 .051 .960 (default: 18)

Median Spread All 3 −.052 −2 .479 .026 ∗∗ _−.091 _{−4 .383} _.001∗∗∗

(default: 50% 3, 50% 6) All 6 −.020 −.964 .350 .041 1 .971 .067 ∗

Added Error on Dissimilarities Small −.029 −1 .397 .183 −.054 −2 .598 .020 ∗∗

(default: none) Large −.024 −1 .143 .271 −.105 −5 .086 .0 0 0 ∗∗∗

Added Error on Number of Medians Small −.078 −3 .718 .002 ∗∗∗ _−.078 _{−3 .800} _.002∗∗∗

(default: none) Large −.067 −3 .190 .006 ∗∗∗ _−.159 _{−7 .709} _{.0 0 0}∗∗∗

R2 _0.678 _{0 .893}

Adjusted R 2 _0.448 _{0 .882}

Mean ARI 0.952 0 .913

Std ARI 0.059 0 .104

∗_{indicates p < .10,}∗∗_{indicates p < .05,}∗∗∗_{indicates p < .01.}

Table 7

Monte Carlo Simulation: factors inﬂuencing group membership.

Factor HCP HPM

Coeﬃcient t-value p-value Coeﬃcient t-value p-value

Intercept .956 9 .216 .0 0 0 ∗∗∗ _.985 _{9 .278} _{.0 0 0}∗∗∗ Individuals 300 .062 .822 .424 .079 1 .031 .319 (default: 150) 450 .059 .783 .446 .111 1 .450 .168 groups 6 - .028 - .374 .714 .009 0 .122 .905 (default: 2) 10 - .134 -1 .784 .095 ∗ _{- .031} _{-0 .405} _.691 Number of Objects 30 .012 .181 .858 .018 0 .278 .785 (default: 18)

Median Spread All 3 .003 .038 .970 −.142 −1 .855 .083 ∗

(default: 50 percent 3, 50 percent 6) All 6 .094 1 .259 .227 .116 1 .507 .152

Added Error on Dissimilarities Small .037 .492 .630 −.018 −0 .239 .814

(default: none) Large .026 .348 .733 −.095 −1 .245 .232

Added Error on Number of Medians Small −.109 −1 .460 .165 −.134 −1 .747 .101 ∗∗

(default: none) Large −.211 −2 .816 .013 ∗∗ _−.428 _{−5 .589} _{.0 0 0}∗∗∗

R2 _0.494 _{0 .763}

Adjusted R 2 _0.124 _{0 .590}

Mean ARI 0.893 0 .813

Stdev ARI 0.170 0 .254

∗_{indicates p < .10,}∗∗_{indicates p < .05,}∗∗∗_{indicates p < .01.}

the task mirrors closely the cognitive activities involved in the categorization process individuals follow as they form similarity judgments ( Coxon, 1999 ), and it leads to as high quality data with less fatigue and boredom from participants as compared to pairwise similarity tasks ( Bijmolt & Wedel, 1995; Rao & Katz, 1971 ).

Data was collected from an online sorting study featuring

m= 189 undergraduate students from a large northeastern United States university who answered about their perceptions about n= 20 chocolate candies disposed at the Corp’s Vital Vittles, the first storefront for Students of Georgetown Inc. opened in 1973. To- day, Vital Vittles is a full-service grocery store which sells frozen foods, meals on the go, and a variety of home supplies. Because the Georgetown Campus housing is fairly isolated, it is considered a one-stop shop: Vittles faces little competition from local grocery stores. It is also very healthy financially, with gross sales averaging over 2 million dollars a year and a 23 percent gross profit margin. As part of its most prominent checkout counter shelves, Vital Vittles offers a large selection of chocolate candy. The shelf section dedicated to chocolate snacks includes the following options: Almond Joy, Baby Ruth, Butterfinger, Hershey (Almond), Hershey (Plain), Junior Mints, Kit Kat, M & M (Peanut), M & M (Plain), Mars Bar, Milky Way, Mounds Bar, Nestle’s Crunch, Oh Henry!,

Payday, Reece’s Cups, Snickers, Three musketeers, Twix, and York Mint.

Do consumers perceive these brands in similar ways? The piles made by these participants provide preliminary evidence that we can expect heterogeneity in the clustering solutions obtained: the mean number of piles (partitions) made by the participants was 5.73 ( min= 2 ,max= 12 ), and the large variation in the number of piles made is illustrated by the histogram in Fig. 1 .

In order to be used by the HCP algorithm, the sort provided by each individual k, for k₌1 _,_._._._,m_, is converted to a dissimilarity matrix Dk_{following the procedure proposed by}_{Takane (1980)}_{, and}

the number of medians ck _{is made equal to the number of piles}

made by that individual in the sorting task.

5.1. Modelselection&performance

For both VNS heuristics, we performed 10 executions of the procedures. All executions were terminated after 600 seconds of CPU time and Table 8 shows the best of the objective functions as the number of groups ( G) increases. To facilitate model selection, the table also shows the percentage improvement obtained when an additional group is added. Both algorithms seem to identify a