• Nenhum resultado encontrado

Topics in Mixed Effects Models

N/A
N/A
Protected

Academic year: 2022

Share "Topics in Mixed Effects Models"

Copied!
210
0
0

Texto

(1)

by

Jos´ e Carlos Pinheiro

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

Doctor of Philosophy (Statistics)

at the

UNIVERSITY OF WISCONSIN – MADISON

1994

(2)

Mixed effects models have received a great deal of attention in the statistical literature for the past forty years because of the flexibility they offer in handling the unbalanced clustered data that arise in many areas of investigation. In this dissertation we consider both linear and nonlinear mixed effects models under maximum likelihood and restricted maximum likelihood estimation. We derive the asymptotic distribution of both maximum likelihood and restricted maxi- mum likelihood estimators in a general linear mixed effects models, under mild regularity conditions. We study different approximations to the loglikelihood function of nonlinear mixed effects models, comparing them with respect to their accuracy and computational efficiency. We describe five different parametriza- tions for variance-covariance matrices that ensure positive definiteness, while leaving the estimation problem unconstrained, comparing them with respect to their computational efficiency and statistical interpretability. We consider the model building issue for mixed effects models, describing techniques for choosing random effects to be incorporated in the model, using structured random effects variance-covariance matrices, and using covariates to explain cluster-to-cluster parameter variability. Finally we describe theSsoftware we have developed for analyzing linear and nonlinear mixed effects models and which we have con- tributed to the StatLib collection.

i

(3)

Abstract i

1 Introduction 1

1.1 Motivation . . . 1

1.2 Linear Mixed Effects Models . . . 2

1.3 Nonlinear Mixed Effects Models . . . 4

1.4 Parametrizations for Variance-Covariance Matrices . . . 5

1.5 Software Development . . . 7

1.6 Model Building . . . 8

1.7 Future Research . . . 10

2 The Linear Mixed Effects Model 11 2.1 Model and Examples . . . 11

2.2 Likelihood Estimation . . . 17

2.3 Bibliographic Review . . . 20

3 Asymptotic Results for the Linear Mixed Effects Model 24 3.1 Maximum Likelihood . . . 25

3.1.1 Limit of φ5 . . . 33

ii

(4)

3.1.4 Limit of φ2 . . . 38

3.1.5 Limit of φ1 . . . 39

3.2 Restricted Maximum Likelihood . . . 48

3.3 Parametrized and/or Structured σ . . . 57

3.4 Conclusions . . . 74

4 The Nonlinear Mixed Effects Model 76 4.1 The Model . . . 76

4.2 Orange Trees . . . 79

4.3 Bibliographic Review . . . 80

5 Approximations to the Loglikelihood in the Nonlinear Mixed Effects Model 83 5.1 Approximations to the Loglikelihood . . . 84

5.1.1 Alternating Approximation . . . 84

5.1.2 Laplacian Approximation . . . 86

5.1.3 Importance Sampling . . . 89

5.1.4 Gaussian quadrature . . . 91

5.2 Comparing the Approximations . . . 93

5.2.1 Orange Trees . . . 94

5.2.2 Theophylline . . . 98

5.2.3 Simulation Results . . . 104

5.3 Conclusions . . . 111 6 Parametrizations for Variance-Covariance Matrices 115

iii

(5)

6.1.2 Log-Cholesky Parametrization . . . 119

6.1.3 Spherical Parametrization . . . 119

6.1.4 Matrix Logarithm Parametrization . . . 121

6.1.5 Givens Parametrization . . . 122

6.2 Comparing the Parametrizations . . . 125

6.3 Conclusions . . . 130

7 Mixed Effects Models Methods and Classes for S 132 7.1 The lmeclass and related methods . . . 133

7.1.1 The lme function . . . 135

7.1.2 The print, summary, and anova methods. . . 136

7.1.3 The plot method . . . 139

7.1.4 Other methods . . . 140

7.2 The nlme class and related methods . . . 142

7.2.1 The nlme function . . . 144

7.2.2 The nlme methods . . . 147

7.3 Conclusions . . . 154

8 Model Building in Mixed Effects Models 157 8.1 Examples . . . 158

8.1.1 Pine Trees . . . 158

8.1.2 Theophylline . . . 160

8.1.3 Quinidine . . . 161

8.1.4 CO2 Uptake . . . 163

8.2 Variance-Covariance Modeling . . . 164 iv

(6)

8.2.3 Quinidine . . . 170

8.2.4 CO2 Uptake . . . 172

8.3 Covariate Modeling . . . 173

8.3.1 Quinidine . . . 175

8.3.2 CO2 Uptake Data . . . 180

8.4 Conclusions . . . 182

9 Conclusions and Suggestions for Future Research 183 9.1 Conclusions . . . 183

9.2 Future Research . . . 185

9.2.1 Asymptotics . . . 185

9.2.2 Parametrizations . . . 186

9.2.3 Assessing Variability . . . 187

Bibliography 188

Appendix A 195

Appendix B 203

v

(7)

Introduction

In this chapter we present an overview of the topics covered in this dissertation.

We discuss the motivation behind mixed effects models and describe briefly the contents of each of the subsequent chapters.

1.1 Motivation

Mixed models were developed to handle clustered data and have been a topic of increasing interest in Statistics for the past forty years. Clustered data can be loosely defined as data in which the observations are grouped into disjoint classes, called clusters, according to some classification criterion. Examples of clustered data include split-plot designs in which the observations pertaining to the same block form a cluster and repeated measures data in which several observations are made sequentially on the same individual (cluster).

Observations in the same cluster usually cannot be considered independent and mixed effects models constitute a convenient tool for modeling cluster de- pendence. In these models the response is assumed to be a function of fixed

(8)

(population) effects, non-observable cluster specific random effects, and an error term. Observations within the same cluster share common random effects and are therefore statistically dependent.

We will restrict ourselves in this dissertation to models in which the error terms and the random effects are normally distributed.

The parameters in a mixed effects model can be classified into two types:

fixed effects, associated with the average effect of predictors on the response, and variance-covariance components, associated with the covariance structure of the random effects and of the error term. In many practical applications estimates of the random effects are also of interest.

Several estimation methods have been proposed for mixed effects models and though maximum likelihood and restricted maximum likelihood (Harville, 1974) are generally adopted for linear mixed effects models (Longford, 1993), there is an ongoing debate in the statistical literature about estimation methods for nonlinear mixed effects models.

1.2 Linear Mixed Effects Models

Linear mixed effects models are mixed effects models in which both the fixed and the random effects contribute linearly to the response function. The general form of such models is

y =+Zb+ (1.2.1)

whereyis the response vector, X andZ are the design matrices corresponding to the fixed and random effects respectively,βis the fixed effects vector, bis the random effects vector, andis the error vector. It is assumed thatb∼ N(0,D) and ∼ N(0,Λ), with b independent of.

(9)

Variance components models (Searle, Casella and McCulloch, 1992), mixed effects ANOVA models (Miller, 1977), and linear models for longitudinal data (Laird and Ware, 1982) are all special cases of model (1.2.1). The linear mixed effects model (1.2.1) is described in detail in chapter 2. Two examples are included there to illustrate the use of this model in the context of mixed effects ANOVA models and repeated measures data.

Maximum likelihood (ML) and restricted maximum likelihood (RML) are the most common estimation methods used for linear mixed effects models. The derivation of (R)ML estimates constitutes a rather complex nonlinear optimiza- tion problem that only became feasible when fast computers became available.

This optimization is usually done using the EM algorithm (Dempster, Laird and Rubin, 1977) or Newton-Raphson methods (Thisted, 1988), but the latter seems to be more efficient than the former (Lindstrom and Bates, 1988). No closed form expressions are available for the distribution of (R)ML estimates and infer- ence usually has to rely on asymptotic results. The classical asymptotic theory available for ML estimates (Lehmann, 1983) cannot be applied to linear mixed effects models, since the observations are not independent. Miller (1977) derived the asymptotic distribution of ML estimates for mixed effects ANOVA models, following the work by Hartley and Rao (1967), but these results had not been extended to the more general linear mixed effects model (1.2.1). We derive, in chapter 3, the asymptotic distribution of both ML and RML estimates in the linear mixed effects model (1.2.1) under quite general regularity conditions.

We also derive the asymptotic distribution of ML and RML estimates of the variance-covariance components in (1.2.1) for a large class of reparametrizations of the variance-covariance matrix of the random effects, that encompasses most cases of practical interest.

(10)

1.3 Nonlinear Mixed Effects Models

Nonlinear mixed effects models are mixed effects models in which some of the fixed and/or random effects occur nonlinearly in the response function. Several different formulations of nonlinear mixed effects models are available in the literature; we will adopt here the model proposed by Lindstrom and Bates (1990), given by

y = f (φ,X) +, where (1.3.1) φ = +Bb

where y is the response vector, f is a general nonlinear function, φ is a mixed effects parameter vector that is expressed as a linear function of the fixed effects βand the random effectsb,X is a matrix of covariates,is the error vector, and Aand B are the design matrices for the fixed and random effects respectively.

As in the linear mixed effects model (1.2.1) it is assumed thatb∼ N(0,D) and ∼ N(0,Λ), withb independent of.

By far the most common application of model (1.3.1) is for repeated mea- sures data and we will restrict ourselves in this dissertation to this type of situation. The nonlinear mixed effects model for repeated measures data is described in detail in chapter 4, together with a real data example of its use.

Different estimation methods have been proposed for the parameters in the nonlinear mixed effects model (1.3.1) and there is an ongoing debate in the liter- ature about the most adequate method(s) (Davidian and Giltinan, 1993). One of the reasons for this variety of estimation methods is related to the numerical complexity involved in the derivation of (R)ML estimates in the nonlinear mixed effects model. This complexity is due to the fact that the likelihood function in

(11)

the nonlinear mixed effects model, which is based on the marginal distribution of y, does not usually have a closed form expression. Different approxima- tions to the loglikelihood in (1.3.1) have been proposed to try to circumvent this problem (Lindstrom and Bates, 1990; Vonesh and Carter, 1992; Davidian and Gallant, 1993). We describe in chapter 5 alternative approximations to the loglikelihood in (1.3.1) based on the Laplacian approximation (Tierney and Kadane, 1986), importance sampling (Geweke, 1989), and Gaussian quadrature (Davis and Rabinowitz, 1984). We present a comparison between these methods and the approximation suggested by Lindstrom and Bates (1990), using sim- ulated and real data and conclude that, in most cases, Lindstrom and Bates’

approximation gives very accurate results.

As in the linear mixed effects model, the distribution of the (R)ML estimates cannot be determined explicitly. Asymptotic results for these estimates have not yet been established and will not be considered in this dissertation.

1.4 Parametrizations for Variance-Covariance Matrices

The (R)ML estimation of the variance-covariance components in both mod- els (1.2.1) and (1.3.1) is usually a difficult numerical problem, since the resulting estimates should correspond to a positive semi-definite matrix. This difficulty has been pointed out by Harville (1977), Lindstrom and Bates (1988), and Searle et al. (1992, chapter 6).

Two approaches can be used for ensuring positive semi-definiteness of the estimated variance-covariance matrix of the random effects: constrained op- timization, where the natural parametrization for the unique elements in the

(12)

variance-covariance matrix is used and the estimates are constrained to be pos- itive semi-definite matrices, and unconstrained optimization, where the unique elements in the variance-covariance matrix are reparametrized in such a way that the resulting estimate must be positive semi-definite. We recommend the use of the second approach, not only for numerical reasons (parameter estima- tion tends to be much easier when there are no constraints), but also because of the superior inferential properties that unconstrained estimates tend to have (e.g. asymptotic properties). Lindstrom and Bates (1988, 1990) describe the use of Cholesky factors for implementing unconstrained (R)ML estimation of variance-covariance components in both the linear and the nonlinear mixed ef- fects models.

We describe, in chapter 6, five different parametrizations for transforming the (R)ML estimation of the variance-covariance components in models (1.2.1) and (1.3.1) into an unconstrained optimization. The basic idea behind all parametrizations considered in this dissertation is to write

D =LTL (1.4.1)

where the unique elements ofLform an unconstrained parameter vector. Differ- ent choices of Llead to different parametrizations ofD. The parametrizations considered in chapter 6 are of two types: three of them are based on the Cholesky factorization of D (Thisted, 1988) and the other two are based on the spectral decomposition (Rao, 1973).

In choosing a parametrization for D one has to take into consideration its computational efficiency and the statistical interpretability of the individ- ual parameters. A comparison of the computational efficiency of the different

(13)

parametrizations, using simulation, is included in chapter 6. The statistical interpretation of the individual parameters in each parametrization is also dis- cussed in that chapter. We conclude that different parametrizations should be used at different stages of the analysis: during the optimization of the (re- stricted) loglikelihood function, a parametrization based on the matrix loga- rithm ofD(Leonard and Hsu, 1993) is to be preferred for its superior computa- tional efficiency; to assess the variability of the variance-covariance components estimates, a parametrization based on the spherical coordinates of the Cholesky factor of D is recommended, since it is the one with the most interpretable elements.

1.5 Software Development

The success of any statistical technique nowadays is directly related to the availability of reliable, efficient, and simple-to-use software for its application.

We describe in chapter 7 a set ofSfunctions, classes, and methods (Chambers and Hastie, 1992) that we developed for the analysis of mixed effects models, using either maximum, or restricted maximum likelihood. These extend the lin- ear and nonlinear modeling facilities available in release 3 of Sand S-plus. The source code, written in S and C using an object-oriented approach, is available in the S collection at StatLib. Help files for all S functions and methods are included in Appendix B.

The two functions used to fit linear and nonlinear mixed effects models are respectively lme() and nlme(). Objects returned by these functions are of classes lmeandnlme respectively, and the latter class inherits from the former.

Several methods are available for both the lme and nlme classes, including

(14)

print, summary, plot, predict and anova. These were developed keeping consistency with the methods of other model fitting functions available in S, such as lm(), glm(), and nls().

The use of theSfunctions and methods for mixed effects models is illustrated in chapter 7 through the analysis of two real data examples: one of a linear mixed effects model and the other of a nonlinear mixed effects model.

1.6 Model Building

Model building in mixed effects models involves questions that do not have a parallel in (fixed effects) linear and nonlinear models. Some of these questions are:

determining which effects should have an associated random component and which should be purely fixed;

using covariates to explain cluster-to-cluster parameter variability;

using structured random effects variance-covariance matrices (e.g. diago- nal matrices) to reduce the number of parameters in the model.

We consider in chapter 8 strategies for addressing these questions in the context of nonlinear mixed effects models, though most of the techniques described are also applicable to linear mixed effects models.

The proposed strategy for choosing the random effects to be included in the model is to start with all parameters as mixed effects, whenever no prior information about the random effects variance-covariance structure is available and convergence is possible. Then examine the eigenvalues of the estimated D

(15)

matrix, checking if one, or more, are close to zero. The associated eigenvector(s) would then give an estimate of the linear combination of the parameters that could be taken as fixed. If near zero eigenvalues are present, a reduced model, in which the corresponding linear combination of random effects is eliminated, can then be fit and compared to the original model by means of likelihood ratio tests or information criterion statistics. In this dissertation we use the Akaike information criterion (Sakamoto, Ishiguro and Kitagawa, 1986) to decide between alternative models, choosing the one with the smaller AIC.

For choosing covariates to explain cluster-to-cluster parameter variability we suggest analyzing plots of random effects estimates (e.g. conditional modes) versus the candidate covariates. If the number of covariates/random effects combinations is large, we suggest using a forward stepwisetype of approach in which covariates are included one at a time and the potential importance of the remaining covariates is (graphically) assessed at each step. The decision on whether or not to include a covariate can be based on the change in the AIC values of the fits with and without it.

In comparing alternative models one must also analyze the residuals from the fit, checking for departures from the model’s assumptions. It is also highly recommended that any model building analysis be done in conjunction with experts in the field of application of the model, to ensure the practical usefulness of the chosen model. The use of the proposed model building strategies is illustrated in chapter 8 through the analyses of four real data examples, obtained from the areas of forestry, ecology, and pharmacokinetics.

(16)

1.7 Future Research

Considerable research effort is currently dedicated to expand the applicability of and improve estimation methods for mixed effects models. We suggest in chapter 9 topics for future research in mixed effects models that were not covered in this dissertation. These include suggestions for:

expanding the asymptotic results of chapter 3 to nonlinear mixed ef- fects models and linear mixed effects models with more general variance- covariance structures for the error term;

deriving unconstrained parametrizations for structured variance-covarian- ce matrices;

comparing methods for assessing the variability of parameter estimates in mixed effects models.

(17)

The Linear Mixed Effects Model

In this chapter we describe a general linear mixed effects model and present two examples of its use in the context of mixed effects ANOVA models and repeated measures data. We also include a brief bibliographic review of linear mixed effects models.

2.1 Model and Examples

We write the linear mixed effects model as

y =+Zb+ (2.1.1)

where y, X, and Z denote respectively the n-dimensional response vector, the n×p0 fixed effects design matrix, and then×m random effects design matrix, β denotes the p0-dimensional vector of fixed effects parameters, b denotes the m-dimensional random effects vector, and denotes the error term.

(18)

The model formulation in (2.1.1) is quite general and in practice some restric- tions on the structure and the distribution of the random effects are assumed.

Assumption 2.1.1 By permuting the columns of Z if necessary, the random effects design matrix can be partitioned as

Z = [Z1 :· · ·:Zr]

where each Zi is of the form

Zi =

Z1i 0 0 · · · 0 0 Z2i 0 · · · 0 ... ... ... . .. ... 0 0 0 · · · Zmi i

with each Zji having the same number of columns qi and a variable number of rows nji. The random effects vector b can be accordingly partitioned as b =

bT1,bT2,· · ·,bTrT and each bi can in turn be partitioned as bi =(b1i)T,(b2i)T,

· · ·,(bmi i)TT. This partition defines a grouping of the random effects into r classes, with the qi random effects belonging to the same class i being observed at exactly mi different levels.

We will restrict ourselves in this dissertation to normal distribution models.

More specifically, we will assume

Assumption 2.1.2 The bji are independent (for different i and/or j) and fol- low a N(0,Di) distribution, follows a N(0,Λ) distribution, and the bji are independent of .

(19)

The Di can be either general positive semi-definite matrices, with qi(qi + 1)/2 free parameters, or structured positive semi-definite matrices, i.e. Di = Di(θi) with the dimension of θi being less than qi(qi + 1)/2 (Jennrich and Schluchter, 1986).

Define

D =

r i=1

Di, DA=

r i=1

(ImiDi)

where denotes the direct sum and denotes the tensor product. Note that D and DA have the same eigenvalues, with different multiplicities (in particu- lar they have the same maximum and minimum eigenvalues). Under assump- tion (2.1.2) it follows thaty has aN(,Σ) distribution (Searle et al., 1992), where Σ= Λ+ZDAZT.

In most applications of linear mixed effects models it is assumed that Λ = σ2I and we will also assume this here.

The mixed effects ANOVA model (Miller, 1977; Searle et al., 1992) is a particular case of model (2.1.1) where qi = 1 and Di =σi2, i= 1, . . . , r. As an example, consider the design in which the experimental units are divided into two blocks, each with two plots, which in turn are divided into two subplots, and two treatment factors A and B, in a 2×2 full factorial arrangement, are used according to the scheme shown in Table 2.1.1.

Assuming that the block and plot effects are random, the corresponding mixed effects ANOVA model can be written as

yijk=µ+bi +Aj +sij +Bk+A.Bjk+ijk, i, j, k = 1,2

where yijk is the response observed in the ith block, jth plot, and kth subplot, µis the grand mean, bi is the random effect corresponding to block i, sij is the

(20)

Table 2.1.1: Split-split plot design Block Plot Subplot A B

1 1 1 1 1

1 1 2 1 2

1 2 1 2 1

1 2 2 2 2

2 1 1 1 1

2 1 2 1 2

2 2 1 2 1

2 2 2 2 2

random effect corresponding to theijth block-plot combination, Aj and Bk are the A and B treatment effects respectively, andijk is the error term. To ensure identifiability of the fixed effects we will use the sum-to-zero conditions

2 j=1

Aj =

2 k=1

Bk =

2 j=1

A.Bjk=

2 k=1

A.Bjk =

2 j,k=1

A.Bjk= 0.

The assumptions of the model are that thebiarei.i.d.with distributionN(0, σ12), the sij are i.i.d. with distribution N(0, σ22) and independent of the bi, and the ijk are i.i.d.with distribution N(0, σ32) and independent of both thebi and the sij.

(21)

In the notation of model (2.1.1), we have

y111 y112 y121 y122 y211 y212 y221 y222

=

1 1 1 1

1 1 −1 −1 1 −1 1 −1 1 −1 −1 1

1 1 1 1

1 1 −1 −1 1 −1 1 −1 1 −1 −1 1

µ A1 B1 A.B11

+

1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1

b1 b2 s11 s12 s21 s22

+

111 112 121 122 211 212 221 222

By setting Zj1 = [1 1 1 1]T , j = 1,2, Zj2 = [1 1]T , j = 1, . . . ,4, and Zi =

jZji, i = 1,2 we see that r = 2, q1 =q2 = 1, m1 = 2, m2 = 4, b1 = [b1 b2]T, and b2 = [s11 s12s21s22]T. Note also that in this example

D =

σ12 0 0 σ22

and DA=

σ21 0 0 0 0 0 0 σ12 0 0 0 0 0 0 σ22 0 0 0 0 0 0 σ22 0 0 0 0 0 0 σ22 0 0 0 0 0 0 σ22

The linear mixed effects model for repeated measures (Laird and Ware, 1982;

Lindstrom and Bates, 1988) is a particular case of model (2.1.1) where r = 1.

As an example we consider the data presented in Grizzle and Allen (1969) from a dental study on the ramus height (in millimeters) measured in 20 boys at ages 8, 8.5, 9, and 9.5 years. The data are shown in figure 2.1.1.

A linear model in age in which both the intercept and the slope vary with the

(22)

Age (years)

Ramus height (mm.)

8.0 8.5 9.0 9.5

46 48 50 52 54 56

a

a a

a

b

b

b

b

c

c

c

c

d d

d

d e

e

e

e f

f f

f

g

g

g g

h h h

h

i

i

i

i

j

j j

j

k k k k

l

l

l

l

m

m

m

m

n

n

n o n

o

o

o p

p

p

p

q

q

q

q r

r

r r

s

s

s s

t

t

t

t

Figure 2.1.1: Ramus heights for 20 boys measured at 4 ages.

boy seems adequate to explain the ramus height evolution. The corresponding linear mixed effects model is written as

yij = (β1+bi1) + (β2+bi2) agej+ij, i= 1, . . . ,20, j = 1, . . . ,4.

where yij is the ramus height of the ith boy at age j, β1 and β2 are the fixed intercept and the fixed slope respectively, bi1 and bi2 are the random intercept and the random slope corresponding to theith boy, andij is the error term. The assumptions of the model are that the bi are i.i.d. with distribution N(0,D1) and theij are i.i.d.with distribution N(0, σ2), independent of the bi. D1 is a general variance-covariance matrix.

In the notation of model (2.1.1) we can express the linear mixed effects model

(23)

as

y=

1 47.8 1 48.8

... ... 1 51.3 1 51.8

β1 β2

+

1 47.8 0 0 · · · 0 0 1 48.8 0 0 · · · 0 0 ... ... ... ... . .. ... ... 0 0 0 0 · · · 1 51.3 0 0 0 0 · · · 1 51.8

b11 b12 ... b20 1 b20 2

+

11 12 ... 20 3 20 4

By letting X[n1· · ·n2,] denote the submatrix of X corresponding to its n1 throughn2 rows and setting Zj1 =X[4j3· · ·4j,], j = 1, . . . ,20,we see that, in this example, r = 1, q1 = 2, m1 = 20, D =D1, and DA=20i=1D.

2.2 Likelihood Estimation

Different estimation methods for the parameters in model (2.1.1) have been pro- posed over the years (Searle et al., 1992), but the most commonly used methods today are maximum likelihood (ML) and restricted maximum likelihood (RML) (Longford, 1993).

It is convenient, when writing the (restricted) likelihood ofyin model (2.1.1), to factor out the variance of the error term, σ2, from the variance-covariance matrix of the random effects, i.e. D = σ2Ds, where Ds is called the scaled variance-covariance matrix of the random effects. Under assumption (2.1.2), the loglikelihood function fory in model (2.1.1) is given by

β, σ2,Ds|y = 1 2

nlog2πσ2+ logI+ZDAsZT

(2.2.1) + 1

σ2 (y)T I +ZDAsZT−1(y)

(24)

For fixed Ds, the values of β and σ2 that maximize (2.2.1) are given by β(Ds) =

XT I +ZDAsZT−1X−1XTI +ZDAsZT−1y (2.2.2)

σ2(Ds) = (1/n)yXβ(Ds)TI +ZDAsZT−1yXβ(Ds) Restricted maximum likelihood estimates (RMLEs) of the variance-covarian- ce components are usually preferred to maximum likelihood estimates (MLEs) in linear mixed effects models. The basic reason for that being that RMLEs take into account the estimation of the fixed effects when calculating the degrees of freedomassociated to the variance-components estimates, while MLEs do not.

The RMLEs are defined as the MLEs of the likelihood of a set of n −p0 linear combinations of the response vector y, corresponding to n−p0 vectors that span the orthogonal complement of the column space of the fixed effects design matrix X (Harville, 1974). One way of defining such a set of vectors is to consider the QR decomposition (Thisted, 1988) of X

X = [Q1 Q2]

R1

0

(2.2.3)

where R1 is upper triangular. It follows from the definition of the QR decom- position that the columns of Q2 define a set of orthonormal vectors that span the orthogonal complement of the column space of X and the RMLEs can be obtained from the likelihood of y = QT2y. From elementary properties of the multivariate normal distribution and the definition of the QR decomposition, y ∼ N(0,Σ), where Σ =σ2I+QT2ZDAsZTQ2. Letting Z =QT2Z and

(25)

n =n−p0, we can write the corresponding restricted likelihood as Rβ, σ2,Ds |y = 1

2

nlog2πσ2+ logI +ZDAsZ∗T

(2.2.4) + 1

σ2y∗T I+ZDAsZ∗T−1y For fixed Ds, the value ofσ2 that maximizes (2.2.4) is

σ2R(Ds) = (1/n)y∗T I+ZDAsZ∗T−1y (2.2.5) The restricted likelihood (2.2.4) does not depend upon β and hence no fixed effects RMLEs are available. Nevertheless the first formula in (2.2.2), with Ds replaced by its corresponding RMLE, is usually employed to provide estimates for the fixed effects in restricted maximum likelihood estimation.

The (R)MLE ofDsin general does not have a closed form expression and its determination constitutes a constrained nonlinear optimization problem whose numerical solution has beeen addressed in several papers (Hartley and Rao, 1967; Laird and Ware, 1982; Lindstrom and Bates, 1988; Wolfinger, Tobias and Sall, 1991). We will not consider the numerical problem of determining the (R)MLE of Ds in this dissertation. Using the formulas in (2.2.2) and (2.2.5) one can express the likelihood (2.2.1), or the restricted likelihood (2.2.4), as a function of Ds alone, greatly simplifying the optimization problem.

The exact distribution of the (R)MLEs cannot be derived in most applica- tions of model (2.1.1) and inference about them usually has to rely on asymptotic results. We derive, in chapter 3, the asymptotic distribution of both the MLE and the RMLE, under quite general regularity conditions.

In many applications of linear mixed effects models, estimates of the random

(26)

effects b are also of interest. In (R)ML estimation the conditional modes of the random effects are frequently used for that purpose (Lindstrom and Bates, 1988). These are defined as the mode of the conditional distribution of b given y, which in the case of maximum likelihood estimation is given by

bM L =DA,M Ls ZT I+ZDA,M Ls ZT−1yXβM L

and in the case of restricted maximum likelihood is given by

bRM L=DA,RM Ls ZT I+ZDA,RM Ls ZT−1yXβRM L

where DA,M Ls , DA,RM Ls , and βM L denote respectively the MLE and RMLE of DAs, and the MLE of β.

2.3 Bibliographic Review

The first developments of linear mixed effects models were related to the so called variance components models, defined as linear mixed effects models in which all random effects are independent (and hence no covariance components are present). Airy (1861) seems to have given the first known formulation of a variance components model while considering a standard measurement problem in astronomy.

Fisher (1925) introduced the ANOVA method for estimating variance com- ponents (i.e. equating sum of squares to their expected values). Tippet (1931) clarified the use of the ANOVA method for analysis of variance designs and extended it to 2-way crossed classification mixed effects models. Possibly the most important paper in ANOVA estimation for unbalanced data is Henderson

(27)

(1953). The three ANOVA methods presented in that paper, later known as Henderson methods, were the standard estimation methods for linear mixed effects models until fast computers became available.

Maximum likelihood estimation for normal distribution variance components models seems to have been first considered by Crump (1947). The landmark paper on ML estimation for variance components models is Hartley and Rao (1967), in which, among other things, the first asymptotic results for the MLE were established. Miller (1977) corrected some problems in Hartley and Rao’s results and established asymptotic results for a large class of variance com- ponent models, giving also conditions for them to hold. Restricted maximum likelihood was introduced by Thompson (1962) and later extended by Patterson and Thompson (1971). Harville (1977) presents a comprehensive review of max- imum likelihood and restricted maximum likelihood estimation in linear mixed effects models and introduces the model formulation given in (2.1.1). Laird and Ware (1982) describe a general linear mixed effects model for repeated measures data and suggest the use of the EM algorithm for obtaining (R)MLEs of the variance-covariance components.

The general structure of the linear mixed effects model (2.1.1) seems to be accepted by most researchers today. The linear mixed effects models literature that has been published after (Harville, 1977) and Laird and Ware (1982) refers more to generalizations of the assumptions in model (2.1.1) and/or to different estimation approaches, than to reformulations of the basic model’s structure.

Chi and Reinsel (1989) consider model (2.1.1) whenΛhas the structure of an autoregressive process of order one (AR(1)). Maximum likelihood estimators of the model parameter and a score test for the autocorrelation are derived. One of the main conclusions is that the use of a AR(1) structure for the cluster-specific

(28)

errors may have the effect of reducing the number of random effects needed in the model, but the investigation of ways to determine the best combination of time series error structure and number of random effects deserves further study.

This issue is also considered by Jones (1990).

A Bayesian analysis of model (2.1.1) using the Gibbs sampler (Geman and Geman, 1984) is described in Gelfand, Hills, Racine-Poon and Smith (1990) and in Wakefield, Smith, Racine-Poon and Gelfand (1994). The Bayesian analysis is developed using a hierarchical model approach. In the second paper the normal distribution of the random effects (b) is replaced by a multivariate Student-t, enhancing the robustness of the fit and giving a method for detecting outlying random effects. The main advantage of this approach is its flexibility in handling complex situations, such as constrained parameters and non-Gaussian distribu- tions for the random effects and/or error terms. The main drawbacks are the intensive computational effort required and the need for prior distributions for all the population parameters involved.

Jennrich and Schluchter (1986) consider ML estimation in linear mixed ef- fects models for repeated measures with structured variance-covariance matri- ces. Their work was extended to the general linear mixed effects models by Wolfinger et al. (1991), who also discuss restricted maximum likelihood. The use of structured matrices is very appealing in practice since many times it is known beforehand that the covariance structure of the random effects and/or the errors follows a particular pattern, and substantial reductions in computing time can thus be achieved.

A generalized linear model version of (2.1.1) is discussed in Liang and Zeger (1986) and Zeger, Liang and Albert (1988). They allow a more flexible error structure that is no longer restricted to being Gaussian and introduce the idea of

(29)

a link function,h, relatingE(y|b) toβandb, so thath(E(y|b)) = +Zb. This model should in fact be considered a competitor of the nonlinear mixed effects model, discussed in chapter 4.

Three books solely dedicated to linear mixed effects models have been re- cently published. Searle et al. (1992) includes a comprehensive review of models and estimation methods for linear mixed effects models, but focuses more on variance components models and mixed effects ANOVA models. Lindsey (1993) covers in detail linear mixed effects models for repeated measures data and Longford (1993) considers linear mixed effects models in a regression context.

(30)

Asymptotic Results for the Linear Mixed Effects Model

Miller (1977) derived the asymptotic distribution of maximum likelihood es- timators for a mixed effects ANOVA model. In section 3.1 we extend these results to the more general linear mixed effects model (2.1.1), showing that, under fairly general conditions, with probability going to one there exists a se- quence of roots of the likelihood equations that is consistent and asymptotically normal. These results are helpful in establishing the asymptotic uncorrelation of the estimators of the fixed effects and the estimators of the variance-covariance components. We also show, in section 3.2, that under fairly general conditions the restricted maximum likelihood estimators for the general linear mixed ef- fects model are consistent and asymptotically normal. In section 3.3, we show that the asymptotic normality of the (restricted) maximum likelihood estima- tors continues to hold for a large class of reparametrizations/structuring of the variance-covariance components. Our conclusions are included in section 3.4.

(31)

The proofs of the lemmas used throughout this chapter are included in Ap- pendix A.

3.1 Maximum Likelihood

Under Assumption 2.1.1 the linear mixed effects model (2.1.1) can alternatively be expressed as

y =+

r i=1

qi

j=1

Ujiaji + (3.1.1)

where the Uji are n×mi incidence-like matrices defined by the relation kth column of Uji =jth column of Zki.

Note that eachUji has at most one nonzero entry per row. We will assume here that it has at least one nonzero entry per column, to rule out trivial cases. The aji vectors are defined by the relationaji

k = bki

j and represent the values of the jth random effect of theith class.

The model formulation (3.1.1) is analogous to that of Hartley and Rao (1967) and Miller (1977) for the mixed effects ANOVA model. We will use it in this chapter to maintain consistency with the terminology used in the second paper.

The covariance matrix of y can be expressed as Σ=σ2I +

r i=1

qi

j,k=1

[Di]jkUji(Uki)T.

By letting p1 =ri=1qi(qi+ 1)/2, σ0 =σ2 and G0 =I and setting

σ1 = [D1]11, σ2 = [D1]12,· · ·, σq1(q1+1)/2+1= [D2]11,· · ·, σp1 = [Dr]qrqr

(32)

G1 = U11(U11)T,G2 =U11(U21)T +U21(U11)T,· · ·,Gp1 =Uqrr(Uqrr)T,

we can write

Σ=

p1

i=0

σiGi. (3.1.2)

This formulation of model (2.1.1) differs from that in Miller (1977) in that some of the σi may assume negative values and some of the Gi are not required to be positive semi-definite.

The following assumptions (equivalent to Assumptions 2.2 through 2.5 in Miller (1977)) are made about model (3.1.1).

Assumption 3.1.1 The matrix X is of full rank p0. Assumption 3.1.2 n≥ p0+p1+ 1.

Assumption 3.1.3 The partitioned matrix X :Uji has rank greater than p0 for i= 1, . . . , r, j = 1, . . . , qi.

Assumption 3.1.4 The matrices G0, G1, . . ., Gp1 are linearly independent, i.e. pi=01 τiGi =0⇐⇒τi = 0, i= 0, . . . , p1.

As mentioned in Miller (1977), Assumption 3.1.1 can always be satisfied by suitably reparametrizing the fixed effects vector. Assumptions 3.1.3 and 3.1.4 ensure that the random effects are not confounded with the fixed effects and with each other.

Let p = p0+p1 + 1 and σ = (σ0, σ1, . . . , σp1)T. Then the parameter space Θ for model (3.1.1) is

Θ =

θ p |θ=βT,σTT , βp0; σ0 >0 and (σ1, . . . , σp1)p1 such that eachDi is positive semi-definite, i= 1, . . . , r}.

Referências

Documentos relacionados

INTERSETORIALIDADE DO TRABALHO EDUCATIVO COM MULHERES A intersetorialidade representa um princípio da Política Nacional de Atenção Básica no Brasil, ao sinalizar como

Composta por oito capítulos e 129 artigos, a legislação que normatiza o serviço de telecomunicações exige que as emissoras de radiodifusão, inclusive televisão,

Nowadays, supporters are consumer fans with a market-centered relationship with the club (reflected in the centrality of consuming club products) rather than a

Relacionar-se bem com o público é uma atividade de extrema importância para todos os setores. Na área comercial, por exemplo, é essencial satisfazer o cliente. No Terceiro

Em uma análise das contribuições da teoria do caos para o conhecimento da realidade dos sistemas da natureza, Laccalle (2011) discute sobre a busca de

Diante da pouca informação sobre os efeitos da fragmentação florestal, particularmente na Região Sul do Brasil, foi realizada uma análise do efeito de borda na riqueza e abundância

As crianças que aprendem outras línguas parecem desenvolver atitudes mais positivas em relação às línguas, aos falantes de outras línguas, bem como à sua participação

A presença de ideias do MMM no ensino de geometria proposto para ser ensinado às crianças em Minas Gerais: uma análise de documentos oficiais e do arquivo da Professora Myriam.. Com