• Nenhum resultado encontrado

Determining the number of components in mixture regression models: an experimental design

N/A
N/A
Protected

Academic year: 2021

Share "Determining the number of components in mixture regression models: an experimental design"

Copied!
11
0
0

Texto

(1)

Volume 40, Issue 2

Determining the number of components in mixture regression models: an

experimental design

Ana Brochado

Instituto Universitário de Lisboa (ISCTE-IUL), DINÂMIA CET – IUL

Vitorino Martins

Faculdade de Economia da Universidade do Porto (FEP-UP)

Abstract

Despite the popularity of mixture regression models, the decision of how many components to retain remains an open issue. This study thus sought to compare the performance of 26 information and classification criteria. Each criterion was evaluated in terms of that component's success rate. The research's full experimental design included manipulating 9 factors and 22 levels. The best results were obtained for 5 criteria: Akaike information criteria 3 (AIC3), AIC4, Hannan-Quinn information criteria, integrated completed likelihood (ICL) Bayesian information criteria (BIC) and ICL with BIC approximation. Each criterion's performance varied according to the experimental conditions.

Citation: Ana Brochado and Vitorino Martins, (2020) ''Determining the number of components in mixture regression models: an experimental design'', Economics Bulletin, Volume 40, Issue 2, pages 1465-1474

Contact: Ana Brochado - ana.brochado@iscte-iul.pt, Vitorino Martins - vmartins@fep.up.pt. Submitted: February 11, 2020. Published: June 02, 2020.

(2)

1. Introduction

Models for mixture regression of normal data facilitate model-based clustering within econometric analysis. These models have attracted considerable interest in recent years, resulting in a vast number of both methodological developments and applications in different areas. Mixture models have been employed in economics and finance (e.g. Anh et al. 2018, Wu 2019). For instance, Yan and Han (2019) modelled stock returns, and Wu (2019) employed mixtures of Gaussian processes to model electricity demand and crop insurance. Despite the growing interest in mixture models, the best way to decide how many mixture components are required to represent the data adequately still needs more research (e.g. Kasahara and Shimotsu 2015, Li et al. 2016, Nasserinejad et al. 2017).

Determining the number of components is a necessary step in addressing unobserved heterogeneity across objects (Gu et al. 2018, Pan et al. 2020). A misspecification of model results in terms of under- or overestimation of the number of mixture components can mislead experts during decision-making processes.

Previous studies have developed a large number of criteria to help identify the mixture components needed. Information criteria (e.g. Akaike 1973) are used to balance the increase in model fit to the data against a larger number of parameters, while classification criteria can ensure that the solution produces well-separated clusters. A substantial majority of previous studies, with a few notable exceptions (e.g. Hawkins et al. 2001), have been designed to test new proposed measures’ performance instead of comparing the existing components’ performance. Therefore, the present study sought to compare the performance of 26 information and classification criteria in terms of identifying the correct number of mixture components and avoiding an expensive experimental design. This research adds to the existing literature (e.g. Hawkins et al. 2001) by comparing a larger number of criteria under a larger number of experimental conditions.

The paper below is organised as follows. The next section starts by briefly describing the theoretical background of mixture regression of normal data. This section also provides a general review of the information and classification criteria considered in the current study. In section three, the experimental design used to generate the simulated data is delineated. Section four discusses the results and presents some guidelines that guarantee, by applying the criteria in question, a more accurate decision about the number of components to include in mixture regression of normal data.

2. Theory/Calculation

2.1 Mixture Regression Model Based on Multivariate Normal Distribution with

Notations

Finite mixture regressions facilitate the performance of model-based clustering for large heterogeneous populations. According to this approach, a sample of observations can be obtained with a specific set of unknown proportions. The finite mixture technique is used to decompose the sample into its mixture components.

A classical mixture regression model (Wedel and DeSarbo 1995) simultaneously allows the probabilistic classification of observations in terms of S mixture components and the estimation of separate regression models relating covariates to the observers’ expectations of dependent variables within latent classes. The underlying idea is that a single set of regression coefficients across all observations may be inadequate and potentially misleading if the observations arise from a number of unknown groups with differing coefficients. Brief

(3)

mathematical derivations of and notations on the multivariate normal mixture regression used in the present study are presented below.

In this methodology, � = � is a metric-dependent vector distributed as a finite mixture of S conditional multivariate normal densities as shown in Equation (1):

��~ ∑ = � � �| �, � , � (1)

in which f is defined by Equation (2): s

� �| �, � , � = � − / |� |− / exp[− / �− ��′ �− �− ��′ ′] (2)

ands, s =1,...,S define independent mixing proportions that satisfy two restrictions:

0s 1 and 1 1  = =

S s s (3) and in which:

s = 1,..., S indicate the mixture components obtained

n = 1,..., N indicate cross-sectional units c (e.g. consumers, companies, countries and

regions)

r = 1,..., R denote the repeated observations of object n j = 1,..., J represent the independent variables

βjs = value of jth regression coefficient for the sth mixture component

βs = (βj) (Jx1) is the vector of regression coefficients for the sth mixture component Σs = (RxR) is the covariance matrix of mixture component s

ynr = the value of dependent variable y for repeated measure r on cross-sectional unit n yn= (ynr) (Rx1) vector for object n

xnrj = the value of the jth independent variable for repeated measure r on cross-sectional unit

n

xn = (xrj) (RxJ) is the matrix of independent variables for cross-sectional unit n Given a sample of N independent objects, the log likelihood is defined by Equation (4):

(

)

1 1 ln ln  , , = = =

 

N S s s n n s s n s L f y x β Σ (4)

s, Σs and βs are estimated by maximising the likelihood that Equation (4) is subject to the restrictions in Equation (3).

The model is estimated with the expectation-maximisation (EM) algorithm (Dempster et

al. 1997). This algorithm considers non-observed data to be defined as an indicator function

ns

z that is assumed to be i.i.d multinomial. zns = 1 if cross-sectional unit n comes from latent class s, and zns = 0 otherwise.

The joint likelihood for the observed and non-observed data that are defined by and

( )

=

n zns

z for all objects is represented as Equation (5):

(

)

1 1 1 1 ln ln , , ln = = = = =



N S +



N S c ns s n n s s ns s n s n s L z f y x β Σ z (5)

Regarding the parameters’ seed values, zns represents the missing data, and the EM algorithm’s two steps – expectation step (E-step) and maximisation step (M-step) – are alternated until a convergent sequence of log-likelihood values is achieved. In the E-step, the expectation of Equation (5) is evaluated over the conditional distribution of the non-observed data zns,given yn, xn and the estimates for s, Σs and βs obtained in the M-step. Each

cross-sectional unit n is assigned to each latent class S via the estimated posterior probability (i.e. applying Bayes’ rule), thereby incorporating fuzzy clustering into the E-step. The conditional expectation of zns is identical to that of � .This can be written as Equation (6):

= ���� �| �,��,��

(4)

in which two conditions need to be satisfied: = =

1 1 S ns s p and 0 pns 1 (7)

In the E-step, the nonobserved discrete data zns are replaced by pns. In the M-step, Equation (5) is maximised with respect to s, Σs and βs.

2.2 Criteria

Information criteria are designed to account for over-parameterisation because likelihood increases with the addition of components to a mixture model when a larger number of mixture components are considered. These criteria balance the increase in the model fit to the data against the larger number of parameters estimated for models with more mixture components. The criteria take the general form of IC( )s = −2 lnL+dk( )s , in which k( )s is the number of parameters associated with a solution with S mixture components and d is some constant or the ‘marginal cost’ per parameter (Bozdogan 1987). Information criteria encompass those that are estimates of relative Kullback-Leibler distance, including Akaike information criteria (AIC) (Akaike 1973), modified AIC 3 (Bozdogan 1994), modified AIC 4 (Bozdogan 1994) and AICc

– small sample AIC (Hurvich and Tsai 1989, 1995), shown in Equations (8) through (11), respectively:

AIC = − � � + (8)

AIC = − � � + (9)

AIC = − � � + (10)

AIC� = AIC +[ � �+ ]�−�− (11)

Bayesian information criteria (BIC) (Schwartz 1978) and adjusted BIC (ABIC) (Sclove 1987) are developed within a Bayesian framework for model selection by using Equations (12) and (13):

BIC = − � � + � � (12)

ABIC = − � � + �[ � + ⁄ ] (13)

Information criteria also cover consistent criteria including consistent AIC (CAIC) (Bozdogan 1987), CAIC with Fisher information (CAICF) (Bozdogan 1987), information complexity criterion (ICOMP) (Bozdogan 1994), Hannan-Quinn (HQ) information criterion (Hannan and Quinn 1979), minimum description length 2 (MDL2) (Liang et al. 1992) and

minimum description length 5 (MDL5) (Liang et al. 1992). These criteria are defined by

Equations (14) through (19), respectively:

(

)

CAIC= −2 lnL+k lnN + 1 (14)

CAICF=AIC+klogN+logF (15)

ICOMP = − � � + � [ (�−1)] − �|�− | (16)

HQ = − � � + � � � (17)

MDL = − � � + � � (18)

MDL = − � � + � � (19)

in which F denotes the inverse Fisher information matrix.

Classification criteria evaluate a mixture model’s ability to provide well-separated clusters of components based on the degree of separation in estimated posterior probabilities. Probabilistic indices include the measure of entropy (Es) (DeSarbo et al. 1992), logarithm of the partition probability (LP) (Biernacki 1997), entropy (E) (Liang et al. 1992), normalised entropy criterion (NEC) (Celeux and Soromenho 1996), classification criterion (C) (Biernacki and Govaert 1997), classification likelihood criterion (CLC) (Biernacki and Govaert 1997) and

(5)

approximate weight of evidence (AWE) (Banfield and Raftery 1993). Two other relevant indices are integrated completed likelihood (ICL)-BIC (Biernacki et al. 2000) and ICL with BIC approximation (ICOMPLBIC) (Dias 2004). Equations (20) through (28) represent each of these indices: 1 1 Es= −1 

 

Nn= Ss=−pnslnpns NlnS (20) 1 1 LP N S nsln ns n= s= z p = −

 

(21) 1 1 E= −

 

nN= Ss= pnslnpns (22) ( ) ( ) NEC s =E s ln ( )L s −ln (1)L (23) C= −2 lnL+2E (24) CLC= −2 lnL+2LP (25)

(

)

AWE=-2 lnLc+2k 3 2+lnN (26) ICL-BIC= −2 lnL+2LP+klnN (27) ICOMPLBIC= −2 lnL+2E+klnN (28)

A second group of indexes have been developed in the literature on fuzzy logic (Bezdek et

al. 1997), including partition coefficient (PC) (Bezdek 1981), partition entropy (PE) (Bezdek

1981), normalised partition entropy (NPE) (Bezdek 1981), non-fuzzy index (NFI) (Roubens 1978), minimum hard tendency (Minht) (Rivera et al. 1990) and mean hard tendency (Meanht)

(Rivera et al. 1990). These indexes are calculated using Equations (29) through (34), respectively: 2 1 1 PC N S ns n= s= p N =

 

(29) 1 1 PE=

 

Nn= Ss= pnslnpns N (30)

NPE=PE 1 S N− (31) ( )   2 1 1 NFI 1 S N ns n s N S S = p N = =   −  −  

 

  (32)

( )

10

1

Min max log

  = − ht s s S T (33)

( )

10 1 Meanht =

Ss=−log Ts S (34)

in which Ts is the mean of all the first and second pns maxima for the cross-sectional units

included in the mixture component s.

3. Methods

When dealing with real world data, the true number of mixture components is unknown, so the selected criteria’s effectiveness as a guide during model selection cannot be evaluated without an experimental design. This research’s experimental design included manipulating 9 factors and 22 levels. Factor 1 was the true number of mixture components (i.e. 2 or 3). Factor 2 was the mean separation between latent classes (i.e. low = 0.5; medium = 1.0; or high = 1.5). Factor 3 was the number of individuals (i.e. 100 or 300). Factor 4 was the number of observations per individual (i.e. 5 or 10). Factor 5 was the number of independent variables (i.e. 2, 6 or 10). Factor 6 was the levels of measurement for independent variables (i.e. binary, metric or half-binary and half-metric). Factor 7 was error variance (i.e. 20% or 60%). Factor 8 was minimum segment size (i.e. 5–10%, 10–20% or 20–30%). Factor 9 was error distribution (i.e. normal vs uniform).

As the design was factorial with three replications (i.e. datasets) per cell, the overall total generated was 7,776 (i.e. 2535). The likelihood was maximised using the EM algorithm, which

(6)

was run repeatedly with three replications in order to avoid its convergence to local maxima. Then, the best solution was retained for each possible number of mixture components.

For each object n and all replications, the programme generated U = Xβ. Next, an error was added to these true values (i.e. U, Y = U+ε). The error term’s variance was obtained from Equation (35), in which PEV is the percent of error variance, u2 is the variance of U and

2

is the variance of the error term ε: 2 2 2 2 2 1         =  =  +  −  u u PEV PEV PEV (35)

To improve the level of general inference, parameter values were randomly generated for each segment and each data set.

Regarding the coefficients for explanatory variables in each group (i.e. βs = (βsj), s = 1, ...,

S), the parameters’ vector for the first segment β1was randomly generated within the intervals -1.5 to 1.5. Next, a separation vector δi (i= L M H, , ) was generated, with an average interval of

0.5, 1.0 or 1.5 and a standard deviation equal to 10% of the average (i.e. 0.05, 0.1 and 0.15), as well as a signs vector +

S , with positives (1) and negatives (-1), for δi(i= L M H, , ). The vectors

with low (0.5), medium (1.0) and large (1.5) separations were denoted, respectively, by δS, δM

and δE. Vector coefficients for segments 2 and 3 were obtained using 2 1 , , , + − = + i i=S M L β β S δ and 3 1 , , , + − = − i i=B M E β β S δ , respectively.

Cross-sectional units were assigned to mixture components on the basis of randomly determined segment sizes for each data set. The smallest segments consisted of 5–10%, 10– 20% or 20–30% of the sample, depending on Factor 7’s level. Normal and uniform distributed errors (i.e. Factor 9) were generated with an average of 0 and standard deviation of 1.

The experimental design applied allowed each model to be fitted with the correct and incorrect number of latent classes. Information and classification criteria were calculated for each solution based on the estimated log-likelihoods and memberships after the EM estimators converged. Then, using the decision rules for information and classification criteria (i.e. maximise or minimise), the identified number of mixture components for each model was retained.

The information and classification criteria were compared in terms of their success rate, namely, the percentage of experimental datasets in which the criteria identified the correct number of latent classes out of a set of competing solutions. Given two criteria with the same success rate, underfitting was preferred to overfitting because previous studies have reported that overfitting produces more parameter bias than underfitting does and that overfitting results in extremely small latent classes with large or unstable parameter values (Cutler and Windham 1994).

4. Results and Discussion

4.1 Overall Results

The best overall success rates were obtained for 3 information criteria – AIC3 (71%), HQ

(69%) and AIC4 (68%) – and 2 classification criteria – ICLBIC (70%) and ICOMPLBIC (68%).

Some information criteria, such as AIC, AICc and ICOMP, showed a tendency to overestimate

the number of mixture components, while others, such as MDL5, MDL2, CAICF, CAIC and

BIC, tended more strongly towards underfitting than overfitting. In general, almost all the classification criteria (i.e. Es, E, LP, AWE, NEC, ICL, ICLBIC, ICOMPLBIC, PC, PE, NPE,

(7)

Figure 1. Overall success rates – underfitting and overfitting by information and classification criterion

4.2 Results by Experimental Condition

An analysis of success rates by experimental condition revealed that no single criterion is best for all experimental conditions. For Factor 1, AWE (99%) registers the best success rate for a 2-mixture component solution, while AIC, AICc and ICOMP (64%) produces the best

results for a 3-segment solution. With regard to Factor 2, ICLBIC (61%) registers the highest success rate for low separation between classes, but AIC3 (76%) is best for medium – and both

AIC3 and HQ (81%) for high – separation between classes. In terms of Factor 3, ICLBIC (67%)

produces the highest rate for 100 individuals, but AIC3 registers the best performance (77%)

for 300 individuals. The results for Factor 4 show that the criteria with the highest success rate for 5 observations per individual are ICLBIC and ICOMPLBIC (65%), and, for 10 observations, AIC3 (79%) is the best.

Regarding Factor 5, the criteria that register the highest response rates for 2 independent variables are AIC4 and HQ, but, for 6, AIC3 (71%) and, for 10, AIC3 and ICLBIC (66%) show

the best performance. The results for Factor 6 reveal that, when the independent variables are all metric, the criteria AIC3 and ICLBIC (68%) register the highest success rates, while AIC3

(75%) is the best criteria for binary variables, as well as when the variables are half-metric and half-binary (70%). When Factor 7 is taken into account, AIC4 and HQ (88%) are the best criteria

for an error variance of 20% (R-squared [R2] = 80%), and AIC and AIC3 offer the highest

success rates for an error variance of 40% (R2 = 60%). With regard to Factor 8, the criteria best suited for a minimum segment size of 5–10% is ICLBIC (64%), but, at 10–20%, and 20–30%, AIC3 (75% and 77%, respectively) has the highest success rates. Finally, the best criteria for

Factor 9 is AIC3 (72.5% and 70%) for normal and uniform error, respectively (see Figure 2). 11% 23% 29% 13% 40% 43% 52% 14% 53% 71% 28% 40% 31% 29% 34% 22% 8% 12% 28% 28% 29% 40% 41% 60% 34% 16% 63% 71% 68% 65% 58% 55% 47% 65% 45% 29% 69% 49% 53% 53% 64% 54% 54% 59% 70% 68% 52% 51% 51% 32% 57% 47% 26% 6% 3% 22% 2% 2% 1% 21% 1% 1% 3% 11% 16% 18% 1% 23% 38% 29% 2% 3% 19% 9% 8% 8% 10% 36% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPLBIC PC PE NPE NFI MEANht MINht Inf or ma ti on C rit er ia C la ss if ic ati on c rit er ia

(8)

Figu re 2 . S u cc ess rat es b y c rite rion an d e xp er im en tal c on d ition 0 10 20 30 40 50 60 70 80 90 1 0 0 AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPL PC PE NPE NFI MEANht MINht % T ru e n u m b e r o f m ix tu re c o m p o n e n ts 2 3 0 10 20 30 40 50 60 70 80 90 1 0 0 AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPL PC PE NPE NFI MEANht MINht % M e a n s e p a ra ti o n b e tw e e n la te n t c la ss e s L o w ( 0 .5 ) M ed iu m (1 .0 ) H ig h (1 .5 ) 0 10 20 30 40 50 60 70 80 90 1 0 0 AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPL PC PE NPE NFI MEANht MINht % N u m b e r o f in d iv id u a ls 1 0 0 3 0 0 0 10 20 30 40 50 60 70 80 90 1 0 0 AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPL PC PE NPE NFI MEANht MINht % N u m b e r o f o b se rv a ti o n s p e r in d iv id u a l 5 10 0 10 20 30 40 50 60 70 80 90 1 0 0 AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPL PC PE NPE NFI MEANht MINht % N u m b e r o f in d e p e n d e n t v a ria b le s 2 6 10 0 10 20 30 40 50 60 70 80 90 1 0 0 AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPL PC PE NPE NFI MEANht MINht % M e a su re m e n t le v e l o f in d e p e n d e n t v a ria b le s m et ri c b in a ry m ix e d 0 10 20 30 40 50 60 70 80 90 100 AIC AIC3 AIC4 AICc BIC CAIC CAICF ICOMP MDL2 MDL5 HQ Es E LP AWE NEC CL CLC ICLBIC ICOMPL PC PE NPE NFI MEANht MINht % Er ro r v a ria n c e 2 0 % 6 0 %

(9)

5. Conclusion

The correct number of mixture components is unknown in real-world applications, so researchers rely on heuristic devices such as information- and classification-based criteria to help them select the best model. Therefore, a thorough understanding of these measures’ performance across different data characteristics is of utmost importance. Researchers who take the wrong measures into consideration will conduct incorrect data analyses, which can subsequently contribute to imprudent managerial decisions.

The present study addressed this problem by comparing 26 classification and information criteria’s performance, using an experimental design that manipulated 9 factors. The best overall success rates for all data sets were obtained for 3 information criteria (i.e. AIC3, HQ and

AIC4) and 2 classification criteria (i.e. ICLBIC and ICOMPLBIC). In the context of multilevel

mixture models, Lukočieė and Verut’s (2009) study found that AIC3 performs best.

The overall results reveal that data characteristics affect information and classification criteria’s performance in terms of identifying the number of classes to include in mixture regression of normal data (Hawkins et al. 2001). More specifically, the results demonstrate that most information and classification criteria exhibit higher success rates when certain conditions are met. These include two mixture components, fewer explanatory variables, metric and mixed binary variables, a larger separation between latent classes, larger sample sizes, smaller error variance and normal distributed errors.

This study’s findings provide researchers and practitioners with a better understanding of 26 criteria’s effectiveness in terms of identifying the best number of classes to include in mixture regression of normal data. However, as no single criterion is able to identify the exact number of mixture components, additional research is needed to find the best criteria. In addition, significant room is left for improvement in current practices, and continued research is required on model selection criteria’s performance to ensure more practical procedural guidelines.

Model selection decisions should be based on the data at hand and the results provided by information and classification criteria, as well as practical and theoretical considerations, in order to provide meaningful research conclusions. Various other research topics thus need to be considered, such as assessing the performance of criteria that address the problem of jointly selecting the number of components and variables in mixture regression models. The present study could also be extended further by considering different scenarios for distribution misspecifications to assess their influence on criteria’s performance.

References

Akaike, H. (1973) “Information Theory as an Extension of the Maximum Likelihood Principle” in Second International Symposium on Information Theory, Budapest, 1973, by B.N. Petrov, F. Csaki and A. Kiado, Eds., 267-281

Anh, L.H., L.S. Dong, V. Kreinovich and N.N. Thach (2018) Econometrics for Financial

Applications, Springer: Berlin

Banfield, J. and A.E. Raftery (1993) “Model-based Gaussian and non-Gaussian clustering”

Biometrika 49, 803-821

Bezdek, J.C. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press: New York

(10)

Bezdek, J.C., W.Q. Li, Y. Attikiouzel and M. Windham (1997) “A geometric approach to cluster validity for normal mixtures” Soft Computing 1, 166-179

Biernacki, C. (1997) Choix de Modèles en Classification (doctoral dissertation), Université de Technologie de Compiègne: Compiègne, France

http://www-math.univ-fcomte.fr/pp_Annu/CBIERNACKI/

Biernacki, C., G. Celeux and G. Govaert (2000) “Assessing a mixture model for clustering with the integrated completed likelihood” IEEE Transactions Pattern Analysis and Machine

Intelligence 22, 719-725

Biernacki, C. and G. Govaert (1997) “Using the classification likelihood to choose the number of clusters” Journal on Scientific Computing 29, 451-457

Bozdogan, H. (1987) “Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions” Psychometrika 52, 345-370

Bozdogan, H. (1994) “Mixture-model Cluster Analysis Using Model Selection Criteria and a New Information Measure of Complexity” in Proceedings of the Fist US/Japan Conference

on the Frontiers of Statistical Modeling: An Informational Approach, Vol. 2, by H. Bozdogan, Ed., Kluwer Academic Publishers: Boston, 69-113

Celeux, G. and G. Soromenho (1996) “An entropy criterion for assessing the number of clusters in a mixture model” Journal of Classification 13, 195-212

Cutler, A. and M.P. Windham (1994) “Information-based Validity Functionals for Mixture Analysis” in Proceedings of the First US/Japan Conference on the Frontiers of Statistical

Modelling: An Informational Approach by H. Bozdogan, Ed., Kluwer Academic Publishers: Boston, 1994, 149-170

Dempster, A.P., N.M. Laird and D.B. Rubin (1997) “Maximum likelihood from incomplete data via EM-algorithm” Journal of the Royal Statistical Society: Series B 39, 1-38

DeSarbo, W.S., M. Wedel, M. Vriens and V. Ramaswamy (1992) “Latent class metric conjoint analsysis” Marketing Letters 3, 273-288

Dias, J.G. (2004) Finite Mixture Models. Review, Applications, and Computer-intensive

Methods. Research School Systems, Organization and Management (doctoral dissertation), Groningen University: Groningen, the Netherlands

Gu, J., R. Koenker and S. Volgushev (2018) “Testing for homogeneity in mixture models”

Econometric Theory, 34, 850-895

Hannan, E.J. and B.G. Quinn (1979) “The determination of the order of an autoregression”

Journal of the Royal Statatistical Society: Series B 41, 190-195

Hawkins, D.S., D.M. Allen and A.J. Stromberg (2001) “Determining the number of

components in mixtures of linear models” Computational Statistics and Data Analysis 38, 15-48

(11)

Hurvich, C.M. and C.L. Tsai (1989) “Regression and time series model selection in small samples” Biometrika 76, 297-307

Hurvich, C.M. and C.L. Tsai (1995) “Model selection for extended quasi-likelihood models in small samples” Biometrics 51, 1077-1084

Kasahara, H. and K. Shimotsu (2015) “Testing the number of components in normal mixture regression models” Journal of the American Statistical Association 110, 1632-1645

Li, M., S. Xiang and W. Yao (2016) “Robust estimation of the number of components for mixtures of linear regression models” Computational Statistics 31, 1539-1555

Liang, Z., Jaszczak, R.J. and R.E. Coleman (1992) “Parameter estimation of finite mixture models using the EM algorithm and information criteria with applications to medical image processing” IEEE Transactions on Nuclear Science 39, 1126-1133

Lukočieė, O. and J.K. Verut (2009) “Determining the Number of Components in Mixture Models for Hierarchical Data” in Advances in Data Analysis, Data Handling and Business

Intelligence by A. Fink, B. Lausen, W. Seidel and A. Ultsch, Eds., Springer: Berlin, 241-249 Nasserinejad, K., J. van Rosmalen, W. de Kort and E. Lesaffre (2017) “Comparison of criteria for choosing the number of classes in Bayesian finite mixture models” PLoS ONE 12,

e0168838. https://doi.org/10.1371/journal.pone.0168838

Pan, L., Y. Li, K. He, Y. Li and Y. Li (2020) “Generalized linear mixed models with

Gaussian mixture random effects: inference and application” Journal of Multivariate Analysis

175, 104555

Rivera, F.F., E.L.E. Zapata and J.M. Carazo (1990) “Cluster validity based on the hard tendency of the fuzzy classification” Pattern Recognition Letters 11, 7-12

Roubens, M. (1978) “Pattern classification problems and fuzzy sets” Fuzzy Sets Systems 1 239-253

Schwartz, G. (1978) “Estimating the dimension of a model” The Annals of Statistics 6, 461-464

Sclove, S.L. (1987) “Application of model-selection criteria to some problems in multivariate analysis” Psychometrika 52, 333-343

Wedel, M. and W.S. DeSarbo (1995) “A mixture likelihood approach for generalized linear models” Journal of Classification 12, 21-55

Wu, W. (2019) Three Essays on Mixture Model and Gaussian Processes (doctoral

dissertation), Texas A & M University: College Station, TX. http : / /hdl .handle .net /1969 .1 /184913

Yan, H. and L. Han (2019) “Empirical distributions of stock returns: mixed normal or kernel density?” Physica A: Atatistical Mechanics and Its Applications 514, 473-486

Imagem

Figure  1.  Overall  success  rates  –   underfitting  and  overfitting  by  information  and  classification criterion
Figu re  2 . S u cc ess rat es  b y c rite rion  an d  e xp er im en tal c on d ition0102030405060708090100AICAIC3AIC4AICcBICCAICCAICFICOMPMDL2MDL5HQEsELPAWENECCLCLCICLBICICOMPLPCPENPENFIMEANhtMINht%True number of mixture components230102030405060708090100

Referências

Documentos relacionados

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

No campo, os efeitos da seca e da privatiza- ção dos recursos recaíram principalmente sobre agricultores familiares, que mobilizaram as comunidades rurais organizadas e as agências

The best way to evaluate the quality of the VRwBP heuristic is to compare its solutions with the results obtained for the multi-depot open vehicle routing problem (MDOVRP) described

With the analysis of the spatial variability of the data, it was observed that the pens positioned in the central region of the two facilities had the least stressful places

Caso utilizado em neonato (recém-nascido), deverá ser utilizado para reconstituição do produto apenas água para injeção e o frasco do diluente não deve ser

Neste trabalho o objetivo central foi a ampliação e adequação do procedimento e programa computacional baseado no programa comercial MSC.PATRAN, para a geração automática de modelos

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

Mas, apesar das recomendações do Ministério da Saúde (MS) sobre aleitamento materno e sobre as desvantagens de uso de bicos artificiais (OPAS/OMS, 2003), parece não