Robust fits for copula models

(1)

Relatórios Coppead é uma publicação do Instituto COPPEAD de Administração da Universidade Federal do Rio de Janeiro (UFRJ)

Comissão de Pesquisa Angela da Rocha Rebecca Arkader Ricardo Leal

Gerência de Publicações Regina Helena Meira de Castro Editoração Eletrônica

Regina Helena Meira de Castro Referenciação e Ficha Catalográfica Ana Rita Mendonça de Moura

Mendes, Beatriz Vaz de Melo.

Robust fits for copula models / Beatriz Vaz de Melo Mendes, Eduardo F. L. de Melo e Roger Nelsen. – Rio de Janeiro: UFRJ/COPPEAD, 2005.

41 p.; 27cm. – (Relatórios Coppead; 374)

ISBN 85-7508-058-X ISSN 1518-3335

1. Finanças – Modelos matemáticos. I. Melo, Eduardo F. L. II. Nelsen, Roger. III. Título. IV. Série.

CDD – 332

Pedidos para Biblioteca

Caixa Postal 68514 – Ilha do Fundão 21941-970 – Rio de Janeiro – RJ

Telefone: 21-2598-9837 Telefax: 21-2598-9835

e-mail: [email protected] Home-page: http://www.coppead.ufrj.br

(2)

ROBUST FITS FOR COPULA MODELS

Beatriz V. M. Mendes1_{, Eduardo F. L. de Melo}2_{and Roger Nelsen}3

Abstract

In this paper we propose and compare two different methodologies for fitting copulas robustly. The first proposal consists of a robustification of the maximum likelihood method, where points previously identified as outliers by a high breakdown point covariance matrix estimator are downweighted in a maximum likelihood optimization procedure. The second proposal obtains robust estimates by minimizing selected empirical copula based goodness of fit statistics. We show through simulations that the proposed robust estimators are able to capture the correct strength of dependence of the data, providing more accurate estimates of copula based dependence measures such as the tail dependence coefficient. The experiments considered several ε-contaminated copula models, for varying proportions ε of contaminating points. Another result in this paper is the finite sample distribution of some selected empirical copula based statistics and corresponding tables for testing and selecting the best copula fit.

1_{COPPEAD Graduate School of Business} 2_{COPPEAD Graduate School of Business} 3_{Lewis & Clark College, USA}

(3)

1) INTRODUCTION

The dependence structure of any multivariate distribution F may be best represented by its pertaining copula C. Given a d-dimensional data set, many suitable parametric copula models (Joe (1999), Nelsen (1999)) are available for this task. Model estimation may be carried on by the maximum likelihood method in two steps: first one performs the marginal estimation and then the copula fit, the so called IFM method (inference function for margins, introduced by Joe and Xu (1996)). The univariate fits typically pose no difficulties. It remains to fit copulas to the d-cube with Uniform(0,1) margins. Joe (1999) argues that we can expect the IFM method to be quite efficient since fully based on maximum likelihood estimation. Efficiency may be assessed either by comparing the estimators asymptotic covariance matrices, or by comparing their mean squared error from Monte Carlo simulations.

When all data points are generated by the same data generating process F, the maximum likelihood method typically yield good estimates (MLE), possessing the usual good statistical properties. However, contaminations may occur in many ways. For example, gross errors generated by some contaminating distribution F* may change the strength and type of the association, resulting in inaccurate estimation of joint probabilities and dependence measures. Even more dangerous would be an error on the data columns alignment, for example when matching slow and fast trading high frequency equity data, which would cause no damage to the marginal fits, but could result in a completely distorted dependence structure. We also note that the copula [0,1]d sample space makes more difficult the graphical inspection of atypical points, especially when d > 2. We need thus an automatic robust procedure that would work well when there are, and when there are not contaminations in the data.

Alternatives to the maximum likelihood estimation method for copulas exist in the literature. They are mainly nonparametric and include Genest, Ghoudi, and Rivest (1993), Genest and Rivest (1993), Capéraà, Fougères, and Genest, (1997), Fermanian and Scaillet (2003), Tsukahara (2005), Morettin et al. (2005), among others. However, to the best of our knowledge, no one has proposed yet robust estimates for copulas.

Accordingly, in this paper we propose and compare two different methodologies for fitting copulas robustly. The first proposal consists in a robustification of the maximum likelihood method, where points previously identified as outliers by a high breakdown point covariance matrix estimator, are downweighted in a maximum

(4)

likelihood optimization procedure. Many high breakdown point covariance matrix estimators may be used in this preliminary phase. One may select the Minimum Volume Ellipsoid (MVE) or the Minimum Covariance Determinant (MCD) estimators of Rousseeuw (1983,1985), any redescending M-estimator (Tyler (1983), Tyler (1991)), the S-estimator (Lopuhaä, 1989), or the CM-estimator (Kent and Tyler, 1996). Illustrations on the role of robust covariance matrix estimators may be found in Rousseeuw and van Zomeren (1990).

Any high breakdown robust estimator, able to find the pattern suggested by the majority of the data, typically downweights some small proportion of the data (see Tyler (1983), Rousseeuw and van Zomeren (1990), among others). In the first exploratory step, we chose to use the covariance affine equivariant estimator MCD, which is implemented in S-Plus. Based on the MCD estimates, a hard rejection weight function assigns zero-one weights to selected data points. In the second step, the copula model is fitted in a weighted maximum likelihood optimization procedure, yielding the Weighted Maximum Likelihood estimates, the WMLE. These estimates possess the usual good asymptotic properties under the true model (Rousseeuw, 1985). Under contaminated models we show in this paper, through simulations, that they possess small bias and variance, and outperform the MLE.

The second proposal obtains robust estimates by minimizing selected empirical copula based goodness of fit statistics. These are the so called Minimum Distance estimators (MDE), first proposed by Wolfowitz (1953, 1957). Since the empirical copula is only defined on a lattice £, we define our distance with discrete norms. We start with well known statistics (see Ané and Kharoubi, 2003) such as the Kolmogorov distance statistic K, the Cramer-von Mises statistic W2, the Anderson-Darling statistic AD, the Integrated Anderson-Darling statistic IAD, and apply different redescending weight functions, yielding 28 Minimum Distance estimators. The newly proposed statistics downweight the influence of points belonging to selected corners of the unit d-cube, introducing robustness.

All estimators are compared in a comprehensive simulation study. The experiments consider ε-contaminated parametric copula families, containing varying proportions ε of contaminating points. The selected families include elliptical copulas (Normal), copulas modeling extreme values, either maxima or exceedances (Gumbel, Galambos, Clayton, Husler Reiss), some widely used in practice copula families (Frank, Cook-Johnson, Joe, Joe-Clayton, Asymmetric Logistic model copula, and some other

(5)

copula families defined in Joe (1999)). For each parametric copula we find the best (smaller mean square error) robust estimator, and indicate the best MDE choice for that particular family. In this way we aim to provide guidance to the researcher or practitioner when applying our methods.

Another result in the present paper are the finite sample and asymptotic distributions of the minimum distance statistics identified as best robust estimators by the simulation experiments. Some selected quantiles are given in tables for testing when searching for the best copula fit. This is an important issue since practitioners usually fit several parametric copula families and would like to have a tool for help choosing the right copula (Durrleman, Nikeghbali, and Roncalli, 2000). We provide a means for answering this long standing question.

The remainder of this paper is organized as follows. In Section 2, we define copulas and review the classical estimation method. In Section 3, we define the new robust estimators. In Section 4, we carry out simulation experiments and compare the performances of classical and robust estimates. Several copula models are selected and contaminated with varying proportions of contaminations. For each family we provide three robust alternatives to the classical estimates possessing smaller mean squared error. Section 5, we show an application of our methods to a real data set. We conclude the paper in section 6.

2) COPULAS AND CLASSICAL ESTIMATION

To simplify the notation, from now on we set d = 2 even though all inference methods in the paper are intended and work for dimensions d > 2. Let (X1, X2) be a continuous random variable (rv) in R2_{with joint distribution function (cdf) F and} margins Fi, I = 1, 2. Consider the probability integral transformation of X1 and X2 into uniformly distributed rvs on [0,1] (denoted Uuniform(0,1)), that is, (U1, U2) = (F1(X1), F2(X2)). The copula C pertaining to F is the joint cdf of (U1, U2). As multivariate distributions with Uniform(0,1) margins, copulas provide very convenient models for studying dependence structure with tools that are scale-free.

As an alternative definition, for every (x1,x2) belonging to [-∞,∞]2_{consider the} point in [0,1]3_{with coordinates (F1(x1), F2(x2), F(x1,x2) ). This mapping from [0,1]}2_to [0,1] is a 2-dimensional copula. From Sklar's theorem (Sklar, 1959) we know that for

(6)

continuous rvs there exists a unique 2-dimensional copula C such that for all (x1,x2) belonging to [-∞,∞]2_,

To measure monotonic dependence, one may use the copula based Kendall's τ correlation coefficient. Kendall's τ does not depend upon the marginal distributions and is given by:

This invariance property is not shared by the linear correlation coefficient ρ, which is actually the Spearman correlation coefficient between X1 and X2. To measure (upper) tail dependence one may use the upper tail dependence coefficient defined as

if this limit exists, and where Fi-1_{is the generalized inverse of Fi, i.e., Fi}-1_{(ui) =} , for i = 1, 2. The lower tail dependence coefficient λL is defined in a similar way. Both the upper and the lower tail dependence coefficients may be expressed using the pertaining copula:

if these limits exist. The measures λU belonging to (0,1] (or λL belonging to (0,1]) quantify the amount of extremal dependence within the class of asymptotically dependent distributions. If λU = 0 (λL = 0) the two variables X1 and X2 are said to be asymptotically independent in the upper (lower) tail.

In the case the true copula belongs to a parametric family , estimates of the parameters may be obtained through the IFM method mentioned in the Introduction, in the context of independent and identically distributed observations. There are mainly two versions: the fully parametric and the semiparametric approaches, detailed in Genest et al. (1993), Shih and Louis (1995), Joe (1999), and Chebrian et al. (2002). The fully parametric approach relies on the assumption of parametric marginal distributions. The Uniform(0,1) data, obtained from the estimated marginals, are used to

(7)

maximize the copula density function with respect to θ. The final results are very sensitive to the right specification of all marginals. In the semiparametric method, in the first step the standardized data are obtained as the empirical cdfs. In this case, the estimation procedure suffers from loss of efficiency, see Genest and Rivest (1993), even though many authors use it to avoid misidentification of the marginal cdfs (Frahm, Junker, Schmid, 2004).

The behavior of the maximum likelihood estimators of copula parameters were investigated through simulations by Capéerà (1997) in the case of the Gumbel or logistic model, by Genest (1987) in the case of the Frank family, and by Mendes (2005) in the case of the Joe-Clayton copula. Genest (1987) investigated the performance of four estimators considering samples of size 10 to 50, and found that the method of moments estimator appears to have smaller mean squared error than the maximum likelihood estimator. Goodness of fit tests for copulas and alternative tools for checking the quality of fits are discussed in Fermanian (2003), Chen and Fan (2005), among others.

In what follows we assume the margins have been already properly estimated and concentrate on fitting copulas robustly.

3) ROBUST ESTIMATES

3.1) The robust weighted maximum likelihood estimates (WMLE)

Let { (u11,u12), (u21,u22), ..., (uT1,uT2) } represent T independent and identically

distributed (iid) observations of a bivariate copula. We first estimate the covariance matrix associated to the data using the high breakdown point affine equivariant MCD estimator. For a given integer h, the MCD location estimator is defined as the mean

of the h points of the T.2 data set for which the determinant of the sample covariance is minimal.

The MCD covariance estimator is the sample covariance of those h points. By taking , the MCD attains the best possible breakdown point at any data set in general position. To obtain consistency at the normal model the “raw” covariance estimate based on the h points is usually multiplied by a factor. We note that no particular distributions (marginals or joint) were assumed for the data. For the cases

(8)

d > 2, Davies (1987) showed that at an elliptical distribution, the MCD estimators are consistent for the true mean and covariance matrix.

At this first step we are not concerned with efficiency. We just want to identify data points which seem not to follow the (linear) dependence structure defined by the majority of the points. Points identified as atypical will be given zero weight. Identification of points is based on the robust distances being the cutoff point the 0.90-quantile of a chisquare distribution with 1 degree of freedom. To illustrate, Figure 1 shows the scatter plot of data simulated from a 5% contaminated Normal copula4_{with ρ} = 0.8, and the ellipsoids of constant probability equal to 0.90 associated with the covariances matrices estimated by the 0.50 breakdown point MCD and by the classical MLE.

\

Figure 1: Data simulated from a 5% contaminated Normal copula with ρ = 0.8

and the ellipsoids of constant probability equal to 0.90 from the robust MCD and the classical MLE.

In the second step we obtain the maximum likelihood estimates of copula parameters θ, using just those data points assigned weights equal to one.

(9)

3.2) Minimum distance estimators

The empirical copula function was introduced by Deheuvels (1979), and their limit properties studied in Deheuvels (1981a, 1981b). Let (x1,t, x2,t), t = 1, ..., T, denote a

sample of T iid bivariate observations from the distribution F with marginals F1 and F2,

copula C with density c, and let {x1,(t), x2,(t)} be the component wise order statistics of the

sample. Consider the lattice

The empirical copula is defined on £ by

where I[.] is the indicator function.

According to Deheuvels (1979, 1981a, 1981b) the following identity

holds, where FT is the empirical distribution function of a sample of F, and

FT,1(x1) = FT(x1,+∞), FT,2(x2) = FT(+∞,x2) are the marginal empirical distributions. It is

shown that Deheuvels's copula converges to C as T increases.

The empirical copula density may be obtained from (Nelsen, 1999) and is given by

Copula measures of goodness of fit may be obtained by minimizing some distance between the empirical copula and a fitted parametric copula C = . To obtain the MDE estimates we propose minimizing the following

(10)

selected empirical copula based goodness of fit statistics. The first discrete norm defined on £ used is the Kolmogorov statistic K defined by

Deheuvels (1979, 1981a, 1981b) also studied the asymptotic properties of the Cramér-Von Mises statistic W2, which is the second empirical copula-based statistic used

The third empirical copula-based statistic used, ADAK, is based on the Anderson-Darling statistic (Stephens, 1974) and given in Ané and Kharoubi (2003).

Ané and Kharoubi (2003) also considered a more global measure of discrepancy given by the Integrated Anderson-Darling statistic, IADAK, given by:

The statistics (9) and (10) emphasize deviations in the tails (the corners of the unit square) by applying a weight function to (7) and (8). The weight function is

The perspective and contours of (11) in the case of a Gumbel copula with θ=2 are shown in column 1 of Figure 2. However, this goal may be better achieved by multiplying (7) and (8) by the weight function w1:

(11)

which emphasizes just the points in the lower left (LL) and the upper right (UR) corners5_{. The factors} _{correspond to the cdf on the "L"} shaped areas located at the LL and the UR quadrants of the unit square. This weight function is illustrated in column 2 of Figure 2, again for the Gumbel copula.

Note that when using the functions introduced above, points in the LR quadrant, in the UL quadrant, as well as those points in the middle will have the same influence on the resulting statistic, and this may be further improved. Accordingly, we propose the redescending weight functions w2,.(t1/T, t2/T), which assign more weight to points

located at the LL, the UR, and both corners, respectively.

The weight function w2,LL(. , .) represents the cdf of the square close to (1,1) and

it is larger when the point is close to (0,0). The weight function w2,UR( . , .) represents the

cdf of the square close to corner (0,0) and it is larger when the point is close to (1,1). The weight function w2,LL-UR( . , . ), illustrated in column 3 of Figure 2, represents the

sum of the previous cdfs, and possesses the nice property of downweighting just points located at the LR and UL corners6_{. These weights are more natural since they all are in} (0,1), whereas those w1 are all greater than 1.

Figure 2 shows the three weight functions associated to the LL-UR case. We observe that the weight function given by (11), in column 1, is too flat in the middle and gives much more weight to LL when compared to UR. Our first proposed weight function w1, illustrated in column 2, is an improvement, since it enhances almost equally the two LL and UR corners, and does not emphasize the LR and the UL corners. The second weight function proposed w2, in column 3, is even more promising 5_{We are estimating positive dependence, as explained in Section 4.}

6_{Note this may be considered a smoothed version of the (hard) weight function used by the WMLE. See} also Figure 3.

(12)

because it gives equal weights to the LL and UR corners and to the middle points, just downweighting the points in LR and UL corners.

Figure 2: Perspective and contours of the weight functions designed to

introduce robustness in the case of the Gumbel copula with θ = 2. Column 1 shows (wAK), column 2, (w1), and column 3 illustrates (w2).

We now define robust variations of the Kolmogorov and of the Cramér-Von Mises statistics based on the proposed weight functions, w1 and w2. They are

(13)

According to the copula type (possessing or not tail dependence) one may consider emphasizing just the LL or the UR quadrant. Thus we consider MDE statistics designed to emphasize just the points in one of the corners, based on variations of (AD1), (AD2), (IAD1), and (IAD2). They are the Lower Left tail Anderson-Darling (LLAD1 and LLAD2), the Upper Right tail Anderson-Darling (URAD1 and URAD2), the Lower Left tail Integrated Anderson-Darling (LLIAD1 and LLIAD2), and the Upper Right tail Integrated Anderson-Darling (URIAD1 and URIAD2). They are all based on weights (w1 and w2). For example,

The other statistics are defined in Appendix 1. Still more weight may be given to the tails if we consider second degree statistics7_{. The second degree statistics, denoted as} 2LLAD1, 2LLAD2, 2URAD1, 2URAD2, 2LLIAD1, 2LLIAD2, 2URIAD1 and 2URIAD2, use squared weights and are given in Appendix 1.

4) SIMULATIONS

In this section we report the results from simulation experiments carried on to assess the performance of the proposed estimators. The copula families selected are those usually chosen in applications. For example, elliptical copulas are used to represent the dependence structure of many real life situations, such as modeling a set of financial log-returns (Embrechts et al., 2003). Our selection was also driven by theoretical considerations, for example, asymptotic results. The Gumbel copula is the limit copula pertaining to the asymptotic distribution of bivariate componentwise maxima (Charpentier (2004), Juri and Wüthrich (2002)). Bivariate excesses beyond high thresholds should be modeled by a Clayton (or Kilmedorf-Sampson) copula (Charpentier (2004), Juri and Wüthrich (2002)). Almost all copula families implemented in S-Plus (software used for computations) were considered. The copula families selected are given in Appendix 2.

Simulations scheme: Data were generated from 12 bivariate parametric copula

families. The sample sizes T were 100, 300 and 500. We considered ε-contaminated models, where a fraction ε.T of observations is replaced by atypical ones from a 7_{For the sake of completeness, we also experimented the concept of entropy used by Ané and Kharoubi} (2003). However, it did not lead to good solutions, and thus are not reported.

(14)

contaminating distribution F*. We set ε equal to 0%, 3%, and 5%, and F* as a normal distribution with very small variance and centered close to some corner of the unit square8_{. We had a total of 108 experiments, and the number of repetitions for each} model was 1000.

Our experiments considered just the cases where the rvs U1 and U2 possess positive association. For the sake of comparisons, for all copula models we set the true parameter value θ0 such that corresponding Kendall's correlation coefficient would be equal to 0.50. The data simulated from F* is expected to act similarly to a point mass contamination, not following the dependence structure implied by the true copula parameters. They are supposed to weaken the strength of dependence shared by the remaining data. Thus our contaminating points are located at the LR or at the UL corners. For data showing negative association the same copulas could be fitted to the transformed (U1, 1-U2) data.

Let represent, respectively, the classical and a robust estimate of the copula parameter θ0. The notation may represent either the WMLE or any of the 28 MDE estimates. To assess the performance of the proposed robust estimators we use the squared loss function L(θ0, ), and compare the Mean Squared Error (MSE). We also compute the percentage reduction in average loss (PRIAL) for compared with MLE, i.e, we compute an estimate of

Simulations results: The simulations results are given in the tables that follow.

The tables show the MSE and the (PRIAL %) for the 3 best robust estimates and for each one out of the 108 experiments. The overall winner is in bold face. We analyse in detail the results for the Clayton, Gumbel, Normal, and Frank copulas, given in Tables 1, 2, 3, and 4. The remaining tables are given in Appendix 3.

Clayton copula. Under 0% of contamination all procedures resulted in very

accurate point estimates. Accuracy and precision increase with T. The IAD shows

8_{A less subjective procedure for defining the outlying values could have been used by applying the} concept of robust distances of Rousseeuw and van Zomeren (1990).

(15)

excellent results very close of the winner, the MLE. Under contaminated models the MLE never won. The WMLE was clearly superior with point estimates very close to the true value.

(16)

Gumbel copula. Under no contamination, the MLE and the MDE statistics

provided accurate point estimates. The WMLE is not a good choice for the Gumbel copula, since it overestimates the parameter and this is true for contaminated and non contaminated data. The LLIAD2 is the winner for contaminated models, presenting superior performance with respect to bias and variance.

(17)

Normal copula. As expected, since it is an elliptical copula, the MLE and the

WMLE were the best estimates for models with and without contaminations. We note though the very good performance of the LLIAD2.

(18)

Frank copula. Under no contamination, MLE is the best estimator. For

contaminated samples, even though the MLE appears as the winner for 5% and T = 300, 500, W2 appears as the winner 3 times and may be considered almost as good.

(19)

(20)

Table 5 presents a summary of results for all copula models. The table gives the

winner and runner up under no contaminations, and the best (robust) option at contaminated models. Results are usually not dependent on the sample size. As expected, under no-contamination the best estimator is the MLE. Under contaminations, the best estimator for the majority of copula families was the LLIAD2. For the BB7 copula, even though the 2URIAD2 and the WMLE were, respectively, the winners at 3% and 5% contamination, the URIAD2 is almost as good for all contaminated models and was chosen as the overall winner. For all copulas possessing just upper tail dependence the LLIAD2 was the winner.

5) APPLICATION TO REAL DATA

The application of the methods shown in this article was done in a data set provided by Insurance Services Office, Inc. This data set consists of 1,500 general insurance claims. One of the variables is the loss of each claim, or the amount of each of the claims (LOSS) and the other one represents the allocated administrative expense to pay the claim (ALAE). In the following table, it can be observed the statistical summary of each variable:

(21)

Based in these association measures it is possible to infer a high positive dependence. From the next plot, we can observe the dependence relationship between the two variables LOSS and ALAE.

In order to fit the marginals, it was used the following procedure: (i) definition of a threshold representing the 95% empirical quantile of each variable LOSS and

(22)

ALAE, (ii) fit of a GPD distribution using l-moments estimation for the exceedances above the threshold (iii) empirical distribution for the rest of the data.

This procedure was used because the empirical distribution usually don't show a good fit for the distribution tails. When there are several data points, the empirical distribution can provide a good fit for the tails. It is possible to observe the histograms of the exceedances above the threshold for the variable LOSS and the exceedances above the threshold for the variable ALAE.

In the next table, we show the estimates marginals parameters with their respective standard errors, evaluated with the bootstrap algorithm.

(23)

After the marginals fitting, we modeled the copulas. In the next plot, we can observe the upper tail dependence between the variables LOSS and ALAE.

Once the data presented a clearly upper tail dependence, it was fitted copulas with this characteristic. Then we fitted the following copulas: (i) AKS (Associated Kimeldorf Sampson), (ii) Gumbel, (iii) Galambos, (iv) Joe, (v) Husler Reiss, (vi) BB5 e (vii) BB6. The chosen estimates for each copula were the winners obtained in the last section of this paper. The results were:

(24)

It can be noted that the best copula fitted to the variables LOSS and ALAE, using the chi-square goodness of fit criteria, was the Galambos copula, with the parameter estimated by maximum likelihood. The estimators IAD1 and LLIAD2 provided very similar results in comparison to the MLE. For the Gumbel fit, the estimators IAD1 and LLIAD2 also provided similar results in comparison to the MLE method. On the other hand, for the AKS copula, the robust estimators LLIAD1 and WMLE provided better results than the MLE method.

6) CONCLUSIONS

In this paper we proposed alternative robust estimators for copulas, motivated by the fact that even high quality data usually possess a small proportion of contaminating points. The first new estimator is based on the robust Minimum Covariance Determinant estimator (MCD). This is a two-step procedure where the weights returned by the MCD are used to identify outlying data points. The maximum likelihood estimates based on selected data are computed in the second step. We note that extreme atypical points are of great importance, but they require specific models, for example, models based on extreme value theory. The second proposal is based on the minimization of selected goodness of fit statistics.

(25)

Simulation experiments indicated that the proposed estimators perform well under ε-Contaminated copula models. For any other copula family not considered here, the simulations may be easily implemented and run relatively fast. Estimators sample distributions may be assessed by simulations. For those already experimented we found a well behaved distribution for small sample sizes. Tables may be constructed for testing hypothesis. We are not addressing the important problem Which one is the right

copula?, but we are indeed providing guidance for estimating several copula models. Of

course conclusion may change with the MCD specification of breakdown point and cutoff point for outliers identification. A sensitive analysis may be carried on to assess the robustness of results with respect to these choices.

7) APPENDIX

7.1) Appendix 1: MDE statistics

The definitions of the other MDE statistics designed to emphasize just the points in one of the corners, based on variations of AD1, AD2, IAD1, and IAD2 are given here. They are the Lower Left tail Anderson-Darling LLAD1 and LLAD2, the Upper Right tail Anderson-Darling URAD1 and URAD2, the Lower Left tail Integrated Darling LLIAD1 and LLIAD2, and the Upper Right tail Integrated Anderson-Darling URIAD1 and URIAD2.

(26)

The second degree statistics, 2LLAD1, 2LLAD2, 2URAD1, 2URAD2, 2LLIAD1, 2LLIAD2, 2URIAD1 and 2URIAD2, and are defined as:

(27)

7.2) Appendix 2: Copula families

Elliptical Copulas: The class of elliptical distributions provides useful examples

of multivariate distributions because they share many of the tractable properties of the multivariate normal distribution. Elliptical copulas are simply the copulas pertaining to elliptical distributions. in this paper, we used the Normal copula.

Normal Copula: The Gaussian or Normal copula is the copula pertaining to the

multivariate normal distribution. It is given by

where ρ is simply the linear correlation coefficient between the two random variables.

Clayton copula: The symmetric Clayton or Kimeldorf-Sampson copula was

obtained by Juri and Wüthrich (2002) as the copula characterizing the limiting dependence structure in the upper-tails of two random variables assuming their dependence structure is Archimedean. It was also obtained by Frees and Valdez (1998) as the copula pertaining to the bivariate Pareto distribution. It is given by

(28)

Associated to Kimeldorf and Sampson copula: Copula used for the real data

application. It was also obtained by Frees and Valdez (1998) as the copula pertaining to the bivariate Pareto distribution. This is a symmetric copula with the form:

Gumbel copula: Well known Gumbel copula (Gumbel, 1960), an Extreme Value

copula as well as an Archimedean copula class, has the following form:

The coefficient of tail dependence is given by λU = 2 - 21/δ_{, and corresponds to} the symmetric logistic model (see Ghoudi, Khoudraji and Rivest, 1998).

Galambos copula: Galambos copula (Galambos, 1975) is an Extreme Value

copula:

Joe copula: The Joe copula (Joe, 1993) is an Archimedean copula and has the

form

Husler Reiss copula: The Husler and Reiss copula (Husler and Reiss, 1989) is

(29)

Frank copula: The Frank copula (Frank, 1979) is an Archimedean copula with

the following distribution function:

Tawn copula: The Tawn copula (Tawn, 1988, 1997) is an asymmetric extreme

value copula, which is an extension of the Gumbel copula. It has the following dependence function

BB4 copula: Capéraà et al. (2000) combined EV and Archimedean copula

classes into a single class called Archimax copulas. The Archimax copulas are copulas which can be represented in the following form:

where A(t) is a valid dependence function and φ a valid Archimedean generator. Archimax copulas reduce to Archimedean copulas for A(t) = 1 and to EV copulas for φ(t) = -log(t). Capéraà et al (2000) proved that it is a valid copula for any combination of valid function φ(t) and A(t). BB4 copula is a example of this class of copula with:

. The distribution function is given by:

where θ > 0, δ > 0. The upper tail dependence coefficient is given by λU = 2-1/δ_.

BB5 copula: BB5 copula (Joe, 1997), an EV copula, is a two-parameter

(30)

BB6 copula: BB6 copula (Joe, 1997), an Archimedean copula, has the form of:

BB7 copula: BB7 copula (Joe, 1997), an Archimedean copula, has the form of:

Empirical copula: If u(1) < u(2) < ... < u(n) and v(1) < v(2) < ... < v(n) are the

order statistics of the univariate samples, the empirical copula emp is defined at the point (i/n, j/n) by the formula:

(31)

(32)

(33)

(34)

(35)

(36)

(37)

(38)

8) REFERENCES

ANÉ, T.; KHAROUBI, C. Dependence structure and risk measure. Journal of

(39)

CAPÉRAÀ, P.; FOUGÉRES, A.L.; GENEST, C. A nonparametric estimation procedure for bivariate extreme value copulas. Biometrika, v. 84, n. 3, p. 567-577, 1997.

CHARPENTIER, A. extremes and dependence: a copula approach. Submitted 2004. Disponível em: http://www.crest.fr/pageperso/lfa/charpent /charpent.htm. DAVIES, P.L. Asymptotic behavior of s-estimates of multivariate location parameters and dispersion matrices. Annals of Statistics, v. 15, p. 1269-1292, 1987.

DEHEUVELS, P. La funcion de dépendance empirique et ses proprietes. Un test non parametrique d’independance. Acad. R. Belg., Bull. Cl. Sci. v. 5, Serie 65, p. 247-292, 1979.

______. A Kolmogorov-Smirnov type test for independence and multivariate samples. Rev. Roum. Math. Pures et Appl., Tome XXVI, v. 2, p. 213-226, 1981a.

______. A nonparametric test for independence. Publicacions de l'ISUP, n. 26, p. 29-50, 1981b.

DURRLEMAN, V.; NIKEGHBALI, K.; RONCALLI, T. Which copula is the right one?. Paris: Credit Lyonnais, Groupe de Reserche Operationnelle, 2000. Working paper.

EMBRECHTS, P.; LINDSKOG, F.; MCNEIL, A.Modelling dependence with copulas and applications to risk management. In: Rachev, S.T. (Ed.) Handbook

of heavy tailed distributions in finance. Amsterdam: Elsevier/North Holland,

2003.

FERMANIAN J-D.; SCAILLET, O. Nonparametric estimation of copulas for time series. Journal of Risk, v. 5, p. 25-54, 2003.

FRAHM, G.; JUNKER, M.; SCHMIDT, R. Estimating the tail-dependence coefficient: Properties and Pitfalls. Mathematical Methods of Operations

Research, v. 55, n. 2, p. 301 – 327, 2004.

FRANK, M. J. On the simultaneous associativity of F(x,y) and x + y - F(x,y).

Aequationes Math, v. 19, p. 194-226, 1979.

FREES, E.W.; VALDEZ, E. Understanding relationships using copulas. North

American Actuarial Journal, v. 2, p. 1-25, 1998.

GALAMBOS, J. Order statistics of samples from multivariate distributions.

(40)

GENEST, C. Frank's family of bivariate distributions. Biometrika, v. 74, p. 549-555, 1987.

______; RIVEST, L.-P. Statistical inference procedures for bivariate

Archimedean copulas. Journal of the American Statistical Association, v. 88, p. 1034-1043, 1993.

______; GHOUDI, K.; RIVEST, L.P. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, v. 82, p. 543-552, 1993.

GHOUDI, K.; KHOUDRAJI, A.; RIVEST, L.P. Proprietes statistiques des copules de valeurs extremes bidimensionnelles. Canadian Journal of

Statistics, v. 26, p. 187-197, 1998.

HUBER, P. J. Robust regression: asymptotics, conjectures and Monte Carlo.

Annals of Statistics, v. 1, p. 799-821, 1973.

HÜSLER, J.; REISS, R.-D. Maxima of normal random vectors: between independence and complete dependence. Statist. Probab. Lett., v. 7, p. 283-286, 1989.

JOE, H. Parametric families of multivariate distributions with given margins.

Journal of Multivariate Analysis, v. 46, p. 262-282, 1993.

______; XU, J. The estimation method of inference function for margins for

multivariate models. Vancouver: Univ. of British Columbia, Dept. of Statistics,

1996. Technical Report, n.166.

______. Multivariate models and dependence concepts. London: Chapman & Hall, 1997.

JURI, A.; WÜTHRICH, M.V. Tail dependence from a distributional point of

view. 2002. Preprint. Disponível em: http://www.math.ethz.ch/$\sim$juri/ publications.html.

KIMELDORF, G.; SAMPSON, A. R. Uniform representation of bivariate distributions. Commun Statist, v. 4, p. 617-627, 1975.

LOPUHAA, H.P. On the relation between S-estimators and M-estimators of multivariate location and covariance. Annals of Statistics, v. 17, p. 1662-1683, 1989.

KENT, J.; TYLER, D. E. Constrained M-estimation for multivariate location and scatter. Ann. Statistics, v. 24, p.1346-1370, 1996.

(41)

MORETTIN, P.A. et al. Nonlinear wavelet estimation of copulas for time series. Proceedings 2nd. Brazilian Conference in Insurance and Finance, Maresias, SP, 2005.

NELSEN, R.B. An introduction to copulas. New York: Springer, 1999. ROUSSEEUW, P.J. Multivariate estimation with high breakdown point. In:

Pannonian Symposium of Mathematical Statistics and Probability, 4.,

Tatzmannsdorf, Austria, September 4-9, 1983.

______. Multivariate estimation with high breakdown point. In: Grossmann, W et al (eds.) Mathematical statistics and applications, Vol. B, p. 283-297. Reidel: Dordrecht, 1985.

______; Van ZOMEREN, B. C. Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, v. 85, p. 633-651, 1990.

SKLAR, A. Fonctions de repartition n dimensions et leurs marges. Publ. Inst.

Statist. Univ. Paris, v. 8, p. 229-231, 1959.

STEPHENS, M. A. EDF statistics for goodness of fit and some comparisons.

Journal of the American Statistical Association, v. 69, p. 730-737, 1974.

TSUKAHARA, H. Semiparametric estimation in copula models. The Canadian

Journal of Statistics, v. 33, n. 3, p. 357–375, 2005.

TAWN, J. A. Bivariate extreme value theory: models and estimation.

Biometrika, v. 75, p. 397-415, 1988.

TYLER, D.E. Breakdown properties of the m-estimators of multivariate

scatter. New Jersey: Rutgers University, 1983. Technical Report.

______. Some issues in the robust estimation of multivariate location and scatter. In: STAHEL, W. A.; WEISBERG, S. (eds.) Directions in Robust

Statistics, Part II, 327-336. New York: Spring-Verlag, 1991.

WOLFOWITZ, J. Estimation by the minimum distance method. Annals of the

Institute of Statistical Mathematics, v. 5, p. 9-23, 1953.

______. The minimum distance method. Annals of Mathematical Statistics, v. 28, p. 75-88, 1957.