Topics in Mixed Eﬀects Models

(1)

by

Jos´ e Carlos Pinheiro

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

Doctor of Philosophy (Statistics)

at the

UNIVERSITY OF WISCONSIN – MADISON

1994

(2)

Mixed effects models have received a great deal of attention in the statistical literature for the past forty years because of the flexibility they offer in handling the unbalanced clustered data that arise in many areas of investigation. In this dissertation we consider both linear and nonlinear mixed effects models under maximum likelihood and restricted maximum likelihood estimation. We derive the asymptotic distribution of both maximum likelihood and restricted maximum likelihood estimators in a general linear mixed effects models, under mild regularity conditions. We study different approximations to the loglikelihood function of nonlinear mixed effects models, comparing them with respect to their accuracy and computational efficiency. We describe five different parametrizations for variance-covariance matrices that ensure positive definiteness, while leaving the estimation problem unconstrained, comparing them with respect to their computational efficiency and statistical interpretability. We consider the model building issue for mixed effects models, describing techniques for choosing random effects to be incorporated in the model, using structured random effects variance-covariance matrices, and using covariates to explain cluster-to-cluster parameter variability. Finally we describe theSsoftware we have developed for analyzing linear and nonlinear mixed effects models and which we have con- tributed to the StatLib collection.

i

(3)

Abstract i

1 Introduction 1

1.1 Motivation . . . 1

1.2 Linear Mixed Eﬀects Models . . . 2

1.3 Nonlinear Mixed Eﬀects Models . . . 4

1.4 Parametrizations for Variance-Covariance Matrices . . . 5

1.5 Software Development . . . 7

1.6 Model Building . . . 8

1.7 Future Research . . . 10

2 The Linear Mixed Eﬀects Model 11 2.1 Model and Examples . . . 11

2.2 Likelihood Estimation . . . 17

2.3 Bibliographic Review . . . 20

3 Asymptotic Results for the Linear Mixed Eﬀects Model 24 3.1 Maximum Likelihood . . . 25

3.1.1 Limit of φ₅ . . . 33

ii

(4)

3.1.4 Limit of φ₂ . . . 38

3.1.5 Limit of φ₁ . . . 39

3.2 Restricted Maximum Likelihood . . . 48

3.3 Parametrized and/or Structured σ . . . 57

3.4 Conclusions . . . 74

4 The Nonlinear Mixed Eﬀects Model 76 4.1 The Model . . . 76

4.2 Orange Trees . . . 79

4.3 Bibliographic Review . . . 80

5 Approximations to the Loglikelihood in the Nonlinear Mixed Eﬀects Model 83 5.1 Approximations to the Loglikelihood . . . 84

5.1.1 Alternating Approximation . . . 84

5.1.2 Laplacian Approximation . . . 86

5.1.3 Importance Sampling . . . 89

5.1.4 Gaussian quadrature . . . 91

5.2 Comparing the Approximations . . . 93

5.2.1 Orange Trees . . . 94

5.2.2 Theophylline . . . 98

5.2.3 Simulation Results . . . 104

5.3 Conclusions . . . 111 6 Parametrizations for Variance-Covariance Matrices 115

iii

(5)

6.1.2 Log-Cholesky Parametrization . . . 119

6.1.3 Spherical Parametrization . . . 119

6.1.4 Matrix Logarithm Parametrization . . . 121

6.1.5 Givens Parametrization . . . 122

6.2 Comparing the Parametrizations . . . 125

7 Mixed Eﬀects Models Methods and Classes for S 132 7.1 The lmeclass and related methods . . . 133

7.1.1 The lme function . . . 135

7.1.2 The print, summary, and anova methods. . . 136

7.1.3 The plot method . . . 139

7.1.4 Other methods . . . 140

7.2 The nlme class and related methods . . . 142

7.2.1 The nlme function . . . 144

7.2.2 The nlme methods . . . 147

8 Model Building in Mixed Eﬀects Models 157 8.1 Examples . . . 158

8.1.1 Pine Trees . . . 158

8.1.2 Theophylline . . . 160

8.1.3 Quinidine . . . 161

8.1.4 CO₂ Uptake . . . 163

8.2 Variance-Covariance Modeling . . . 164 iv

(6)

8.2.3 Quinidine . . . 170

8.2.4 CO₂ Uptake . . . 172

8.3 Covariate Modeling . . . 173

8.3.1 Quinidine . . . 175

8.3.2 CO₂ Uptake Data . . . 180

9 Conclusions and Suggestions for Future Research 183 9.1 Conclusions . . . 183

9.2 Future Research . . . 185

9.2.1 Asymptotics . . . 185

9.2.2 Parametrizations . . . 186

9.2.3 Assessing Variability . . . 187

Bibliography 188

Appendix A 195

Appendix B 203

v

(7)

Introduction

In this chapter we present an overview of the topics covered in this dissertation.

We discuss the motivation behind mixed eﬀects models and describe brieﬂy the contents of each of the subsequent chapters.

1.1 Motivation

Mixed models were developed to handle clustered data and have been a topic of increasing interest in Statistics for the past forty years. Clustered data can be loosely deﬁned as data in which the observations are grouped into disjoint classes, called clusters, according to some classiﬁcation criterion. Examples of clustered data include split-plot designs in which the observations pertaining to the same block form a cluster and repeated measures data in which several observations are made sequentially on the same individual (cluster).

Observations in the same cluster usually cannot be considered independent and mixed eﬀects models constitute a convenient tool for modeling cluster de- pendence. In these models the response is assumed to be a function of ﬁxed

(8)

(population) effects, non-observable cluster specific random effects, and an error term. Observations within the same cluster share common random effects and are therefore statistically dependent.

We will restrict ourselves in this dissertation to models in which the error terms and the random eﬀects are normally distributed.

The parameters in a mixed eﬀects model can be classiﬁed into two types:

fixed effects, associated with the average effect of predictors on the response, and variance-covariance components, associated with the covariance structure of the random effects and of the error term. In many practical applications estimates of the random effects are also of interest.

Several estimation methods have been proposed for mixed effects models and though maximum likelihood and restricted maximum likelihood (Harville, 1974) are generally adopted for linear mixed effects models (Longford, 1993), there is an ongoing debate in the statistical literature about estimation methods for nonlinear mixed effects models.

1.2 Linear Mixed Eﬀects Models

Linear mixed effects models are mixed effects models in which both the fixed and the random effects contribute linearly to the response function. The general form of such models is

y =Xβ+Zb+ (1.2.1)

whereyis the response vector, X andZ are the design matrices corresponding to the fixed and random effects respectively,βis the fixed effects vector, bis the random effects vector, andis the error vector. It is assumed thatb∼ N(0,D) and ∼ N(0,Λ), with b independent of.

(9)

Variance components models (Searle, Casella and McCulloch, 1992), mixed effects ANOVA models (Miller, 1977), and linear models for longitudinal data (Laird and Ware, 1982) are all special cases of model (1.2.1). The linear mixed effects model (1.2.1) is described in detail in chapter 2. Two examples are included there to illustrate the use of this model in the context of mixed effects ANOVA models and repeated measures data.

Maximum likelihood (ML) and restricted maximum likelihood (RML) are the most common estimation methods used for linear mixed eﬀects models. The derivation of (R)ML estimates constitutes a rather complex nonlinear optimization problem that only became feasible when fast computers became available.

This optimization is usually done using the EM algorithm (Dempster, Laird and Rubin, 1977) or Newton-Raphson methods (Thisted, 1988), but the latter seems to be more efficient than the former (Lindstrom and Bates, 1988). No closed form expressions are available for the distribution of (R)ML estimates and inference usually has to rely on asymptotic results. The classical asymptotic theory available for ML estimates (Lehmann, 1983) cannot be applied to linear mixed effects models, since the observations are not independent. Miller (1977) derived the asymptotic distribution of ML estimates for mixed effects ANOVA models, following the work by Hartley and Rao (1967), but these results had not been extended to the more general linear mixed effects model (1.2.1). We derive, in chapter 3, the asymptotic distribution of both ML and RML estimates in the linear mixed effects model (1.2.1) under quite general regularity conditions.

We also derive the asymptotic distribution of ML and RML estimates of the variance-covariance components in (1.2.1) for a large class of reparametrizations of the variance-covariance matrix of the random eﬀects, that encompasses most cases of practical interest.

(10)

1.3 Nonlinear Mixed Eﬀects Models

Nonlinear mixed effects models are mixed effects models in which some of the fixed and/or random effects occur nonlinearly in the response function. Several different formulations of nonlinear mixed effects models are available in the literature; we will adopt here the model proposed by Lindstrom and Bates (1990), given by

y = f (φ,X) +, where (1.3.1) φ = Aβ+Bb

where y is the response vector, f is a general nonlinear function, φ is a mixed effects parameter vector that is expressed as a linear function of the fixed effects βand the random effectsb,X is a matrix of covariates,is the error vector, and Aand B are the design matrices for the fixed and random effects respectively.

As in the linear mixed eﬀects model (1.2.1) it is assumed thatb∼ N(0,D) and ∼ N(0,Λ), withb independent of.

By far the most common application of model (1.3.1) is for repeated measures data and we will restrict ourselves in this dissertation to this type of situation. The nonlinear mixed eﬀects model for repeated measures data is described in detail in chapter 4, together with a real data example of its use.

Different estimation methods have been proposed for the parameters in the nonlinear mixed effects model (1.3.1) and there is an ongoing debate in the literature about the most adequate method(s) (Davidian and Giltinan, 1993). One of the reasons for this variety of estimation methods is related to the numerical complexity involved in the derivation of (R)ML estimates in the nonlinear mixed effects model. This complexity is due to the fact that the likelihood function in

(11)

the nonlinear mixed eﬀects model, which is based on the marginal distribution of y, does not usually have a closed form expression. Diﬀerent approximations to the loglikelihood in (1.3.1) have been proposed to try to circumvent this problem (Lindstrom and Bates, 1990; Vonesh and Carter, 1992; Davidian and Gallant, 1993). We describe in chapter 5 alternative approximations to the loglikelihood in (1.3.1) based on the Laplacian approximation (Tierney and Kadane, 1986), importance sampling (Geweke, 1989), and Gaussian quadrature (Davis and Rabinowitz, 1984). We present a comparison between these methods and the approximation suggested by Lindstrom and Bates (1990), using sim- ulated and real data and conclude that, in most cases, Lindstrom and Bates’

approximation gives very accurate results.

As in the linear mixed eﬀects model, the distribution of the (R)ML estimates cannot be determined explicitly. Asymptotic results for these estimates have not yet been established and will not be considered in this dissertation.

1.4 Parametrizations for Variance-Covariance Matrices

The (R)ML estimation of the variance-covariance components in both models (1.2.1) and (1.3.1) is usually a difficult numerical problem, since the resulting estimates should correspond to a positive semi-definite matrix. This difficulty has been pointed out by Harville (1977), Lindstrom and Bates (1988), and Searle et al. (1992, chapter 6).

Two approaches can be used for ensuring positive semi-deﬁniteness of the estimated variance-covariance matrix of the random eﬀects: constrained optimization, where the natural parametrization for the unique elements in the

(12)

variance-covariance matrix is used and the estimates are constrained to be positive semi-deﬁnite matrices, and unconstrained optimization, where the unique elements in the variance-covariance matrix are reparametrized in such a way that the resulting estimate must be positive semi-deﬁnite. We recommend the use of the second approach, not only for numerical reasons (parameter estimation tends to be much easier when there are no constraints), but also because of the superior inferential properties that unconstrained estimates tend to have (e.g. asymptotic properties). Lindstrom and Bates (1988, 1990) describe the use of Cholesky factors for implementing unconstrained (R)ML estimation of variance-covariance components in both the linear and the nonlinear mixed effects models.

We describe, in chapter 6, ﬁve diﬀerent parametrizations for transforming the (R)ML estimation of the variance-covariance components in models (1.2.1) and (1.3.1) into an unconstrained optimization. The basic idea behind all parametrizations considered in this dissertation is to write

D =L^TL (1.4.1)

where the unique elements ofLform an unconstrained parameter vector. Diﬀer- ent choices of Llead to diﬀerent parametrizations ofD. The parametrizations considered in chapter 6 are of two types: three of them are based on the Cholesky factorization of D (Thisted, 1988) and the other two are based on the spectral decomposition (Rao, 1973).

In choosing a parametrization for D one has to take into consideration its computational efficiency and the statistical interpretability of the individual parameters. A comparison of the computational efficiency of the different

(13)

parametrizations, using simulation, is included in chapter 6. The statistical interpretation of the individual parameters in each parametrization is also discussed in that chapter. We conclude that different parametrizations should be used at different stages of the analysis: during the optimization of the (restricted) loglikelihood function, a parametrization based on the matrix logarithm ofD(Leonard and Hsu, 1993) is to be preferred for its superior computational efficiency; to assess the variability of the variance-covariance components estimates, a parametrization based on the spherical coordinates of the Cholesky factor of D is recommended, since it is the one with the most interpretable elements.

1.5 Software Development

The success of any statistical technique nowadays is directly related to the availability of reliable, eﬃcient, and simple-to-use software for its application.

We describe in chapter 7 a set ofSfunctions, classes, and methods (Chambers and Hastie, 1992) that we developed for the analysis of mixed eﬀects models, using either maximum, or restricted maximum likelihood. These extend the linear and nonlinear modeling facilities available in release 3 of Sand S-plus. The source code, written in S and C using an object-oriented approach, is available in the S collection at StatLib. Help ﬁles for all S functions and methods are included in Appendix B.

The two functions used to ﬁt linear and nonlinear mixed eﬀects models are respectively lme() and nlme(). Objects returned by these functions are of classes lmeandnlme respectively, and the latter class inherits from the former.

Several methods are available for both the lme and nlme classes, including

(14)

print, summary, plot, predict and anova. These were developed keeping consistency with the methods of other model ﬁtting functions available in S, such as lm(), glm(), and nls().

The use of theSfunctions and methods for mixed effects models is illustrated in chapter 7 through the analysis of two real data examples: one of a linear mixed effects model and the other of a nonlinear mixed effects model.

1.6 Model Building

Model building in mixed effects models involves questions that do not have a parallel in (fixed effects) linear and nonlinear models. Some of these questions are:

• determining which eﬀects should have an associated random component and which should be purely ﬁxed;

• using covariates to explain cluster-to-cluster parameter variability;

• using structured random eﬀects variance-covariance matrices (e.g. diago- nal matrices) to reduce the number of parameters in the model.

We consider in chapter 8 strategies for addressing these questions in the context of nonlinear mixed eﬀects models, though most of the techniques described are also applicable to linear mixed eﬀects models.

The proposed strategy for choosing the random effects to be included in the model is to start with all parameters as mixed effects, whenever no prior information about the random effects variance-covariance structure is available and convergence is possible. Then examine the eigenvalues of the estimated D

(15)

matrix, checking if one, or more, are close to zero. The associated eigenvector(s) would then give an estimate of the linear combination of the parameters that could be taken as fixed. If near zero eigenvalues are present, a reduced model, in which the corresponding linear combination of random effects is eliminated, can then be fit and compared to the original model by means of likelihood ratio tests or information criterion statistics. In this dissertation we use the Akaike information criterion (Sakamoto, Ishiguro and Kitagawa, 1986) to decide between alternative models, choosing the one with the smaller AIC.

For choosing covariates to explain cluster-to-cluster parameter variability we suggest analyzing plots of random effects estimates (e.g. conditional modes) versus the candidate covariates. If the number of covariates/random effects combinations is large, we suggest using a forward stepwisetype of approach in which covariates are included one at a time and the potential importance of the remaining covariates is (graphically) assessed at each step. The decision on whether or not to include a covariate can be based on the change in the AIC values of the fits with and without it.

In comparing alternative models one must also analyze the residuals from the ﬁt, checking for departures from the model’s assumptions. It is also highly recommended that any model building analysis be done in conjunction with experts in the ﬁeld of application of the model, to ensure the practical usefulness of the chosen model. The use of the proposed model building strategies is illustrated in chapter 8 through the analyses of four real data examples, obtained from the areas of forestry, ecology, and pharmacokinetics.

(16)

1.7 Future Research

Considerable research effort is currently dedicated to expand the applicability of and improve estimation methods for mixed effects models. We suggest in chapter 9 topics for future research in mixed effects models that were not covered in this dissertation. These include suggestions for:

• expanding the asymptotic results of chapter 3 to nonlinear mixed effects models and linear mixed eﬀects models with more general variance- covariance structures for the error term;

• deriving unconstrained parametrizations for structured variance-covariance matrices;

• comparing methods for assessing the variability of parameter estimates in mixed eﬀects models.

(17)

The Linear Mixed Eﬀects Model

In this chapter we describe a general linear mixed effects model and present two examples of its use in the context of mixed effects ANOVA models and repeated measures data. We also include a brief bibliographic review of linear mixed effects models.

2.1 Model and Examples

We write the linear mixed eﬀects model as

y =Xβ+Zb+ (2.1.1)

where y, X, and Z denote respectively the n-dimensional response vector, the n×p₀ fixed effects design matrix, and then×m random effects design matrix, β denotes the p₀-dimensional vector of fixed effects parameters, b denotes the m-dimensional random effects vector, and denotes the error term.

(18)

The model formulation in (2.1.1) is quite general and in practice some restric- tions on the structure and the distribution of the random eﬀects are assumed.

Assumption 2.1.1 By permuting the columns of Z if necessary, the random eﬀects design matrix can be partitioned as

Z = [Z₁ :· · ·:Z_r]

where each Zi is of the form

Zi =







Z¹_i 0 0 · · · 0 0 Z²_i 0 · · · 0 ... ... ... . .. ... 0 0 0 · · · Z^m_i ⁱ







with each Z^j_i having the same number of columns q_i and a variable number of rows n^j_i. The random eﬀects vector b can be accordingly partitioned as b =

b^T₁,b^T₂,· · ·,b^T_r^T and each b_i can in turn be partitioned as b_i =(b¹_i)^T,(b²_i)^T,

· · ·,(b^m_i ⁱ)^T^T. This partition defines a grouping of the random effects into r classes, with the q_i random effects belonging to the same class i being observed at exactly m_i different levels.

We will restrict ourselves in this dissertation to normal distribution models.

More speciﬁcally, we will assume

Assumption 2.1.2 The b^j_i are independent (for diﬀerent i and/or j) and fol- low a N(0,D_i) distribution, follows a N(0,Λ) distribution, and the b^j_i are independent of .

(19)

The Di can be either general positive semi-deﬁnite matrices, with q_i(q_i + 1)/2 free parameters, or structured positive semi-deﬁnite matrices, i.e. D_i = Di(θi) with the dimension of θi being less than q_i(q_i + 1)/2 (Jennrich and Schluchter, 1986).

Deﬁne

D =

r i=1

D_i, D_A=

r i=1

(I_m_i⊗D_i)

where ⊕ denotes the direct sum and ⊗ denotes the tensor product. Note that D and DA have the same eigenvalues, with diﬀerent multiplicities (in particular they have the same maximum and minimum eigenvalues). Under assumption (2.1.2) it follows thaty has aN(Xβ,Σ) distribution (Searle et al., 1992), where Σ= Λ+ZDAZ^T.

In most applications of linear mixed eﬀects models it is assumed that Λ = σ²I and we will also assume this here.

The mixed eﬀects ANOVA model (Miller, 1977; Searle et al., 1992) is a particular case of model (2.1.1) where q_i = 1 and D_i =σ_i², i= 1, . . . , r. As an example, consider the design in which the experimental units are divided into two blocks, each with two plots, which in turn are divided into two subplots, and two treatment factors A and B, in a 2×2 full factorial arrangement, are used according to the scheme shown in Table 2.1.1.

Assuming that the block and plot eﬀects are random, the corresponding mixed eﬀects ANOVA model can be written as

y_ijk=µ+b_i +A_j +s_ij +B_k+A.B_jk+_ijk, i, j, k = 1,2

where y_ijk is the response observed in the ith block, jth plot, and kth subplot, µis the grand mean, b_i is the random eﬀect corresponding to block i, s_ij is the

(20)

Table 2.1.1: Split-split plot design Block Plot Subplot A B

1 1 1 1 1

1 1 2 1 2

1 2 1 2 1

1 2 2 2 2

2 1 1 1 1

2 1 2 1 2

2 2 1 2 1

2 2 2 2 2

random effect corresponding to theijth block-plot combination, A_j and B_k are the A and B treatment effects respectively, and_ijk is the error term. To ensure identifiability of the fixed effects we will use the sum-to-zero conditions

2 j=1

A_j =

2 k=1

B_k =

2 j=1

A.B_jk=

2 k=1

A.B_jk =

2 j,k=1

A.B_jk= 0.

The assumptions of the model are that theb_iarei.i.d.with distributionN(0, σ₁²), the s_ij are i.i.d. with distribution N(0, σ²₂) and independent of the b_i, and the _ijk are i.i.d.with distribution N(0, σ₃²) and independent of both theb_i and the s_ij.

(21)

In the notation of model (2.1.1), we have





 y₁₁₁ y₁₁₂ y₁₂₁ y₁₂₂ y₂₁₁ y₂₁₂ y₂₂₁ y₂₂₂







=







1 1 1 1

1 1 −1 −1 1 −1 1 −1 1 −1 −1 1

1 1 1 1

1 1 −1 −1 1 −1 1 −1 1 −1 −1 1











 µ A₁ B₁ A.B₁₁





 +







1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1











 b₁ b₂ s₁₁ s₁₂ s₂₁ s₂₂





 +





 ₁₁₁ ₁₁₂ ₁₂₁ ₁₂₂ ₂₁₁ ₂₁₂ ₂₂₁ ₂₂₂







By setting Z^j₁ = [1 1 1 1]^T , j = 1,2, Z^j₂ = [1 1]^T , j = 1, . . . ,4, and Z_i =

jZ^j_i, i = 1,2 we see that r = 2, q₁ =q₂ = 1, m₁ = 2, m₂ = 4, b1 = [b₁ b₂]^T, and b2 = [s₁₁ s₁₂s₂₁s₂₂]^T. Note also that in this example

D =



 σ₁² 0 0 σ²₂



 and D_A=







σ²₁ 0 0 0 0 0 0 σ₁² 0 0 0 0 0 0 σ₂² 0 0 0 0 0 0 σ²₂ 0 0 0 0 0 0 σ₂² 0 0 0 0 0 0 σ₂²







The linear mixed eﬀects model for repeated measures (Laird and Ware, 1982;

Lindstrom and Bates, 1988) is a particular case of model (2.1.1) where r = 1.

As an example we consider the data presented in Grizzle and Allen (1969) from a dental study on the ramus height (in millimeters) measured in 20 boys at ages 8, 8.5, 9, and 9.5 years. The data are shown in ﬁgure 2.1.1.

A linear model in age in which both the intercept and the slope vary with the

(22)

Age (years)

Ramus height (mm.)

8.0 8.5 9.0 9.5

46 48 50 52 54 56

a

a a

a

b

c

d d

d

d e

e

e f

f f

f

g

g g

h h h

h

i

j

j j

j

k k k k

l

m

n

n o n

o

o p

p

q

q r

r

r r

s

s s

t

Figure 2.1.1: Ramus heights for 20 boys measured at 4 ages.

boy seems adequate to explain the ramus height evolution. The corresponding linear mixed eﬀects model is written as

y_ij = (β₁+b_i1) + (β₂+b_i2) age_j+_ij, i= 1, . . . ,20, j = 1, . . . ,4.

where y_ij is the ramus height of the ith boy at age j, β₁ and β₂ are the ﬁxed intercept and the ﬁxed slope respectively, b_i1 and b_i2 are the random intercept and the random slope corresponding to theith boy, and_ij is the error term. The assumptions of the model are that the bi are i.i.d. with distribution N(0,D1) and the_ij are i.i.d.with distribution N(0, σ²), independent of the b_i. D₁ is a general variance-covariance matrix.

In the notation of model (2.1.1) we can express the linear mixed eﬀects model

(23)

as

y=







1 47.8 1 48.8

... ... 1 51.3 1 51.8









 β₁ β₂



+







1 47.8 0 0 · · · 0 0 1 48.8 0 0 · · · 0 0 ... ... ... ... . .. ... ... 0 0 0 0 · · · 1 51.3 0 0 0 0 · · · 1 51.8













b₁₁ b₁₂ ... b_{20 1} b_{20 2}







+







₁₁ ₁₂ ... _{20 3} _{20 4}







By letting X[n₁· · ·n₂,] denote the submatrix of X corresponding to its n₁ throughn₂ rows and setting Z^j₁ =X[4j−3· · ·4j,], j = 1, . . . ,20,we see that, in this example, r = 1, q₁ = 2, m₁ = 20, D =D1, and DA=²⁰_i=1D.

2.2 Likelihood Estimation

Diﬀerent estimation methods for the parameters in model (2.1.1) have been proposed over the years (Searle et al., 1992), but the most commonly used methods today are maximum likelihood (ML) and restricted maximum likelihood (RML) (Longford, 1993).

It is convenient, when writing the (restricted) likelihood ofyin model (2.1.1), to factor out the variance of the error term, σ², from the variance-covariance matrix of the random eﬀects, i.e. D = σ²D^s, where D^s is called the scaled variance-covariance matrix of the random eﬀects. Under assumption (2.1.2), the loglikelihood function fory in model (2.1.1) is given by

β, σ²,D^s|y = −1 2

nlog2πσ²+ logI+ZD_A^sZ^T

(2.2.1) + 1

σ² (y−Xβ)^T I +ZD_A^sZ^T⁻¹(y−Xβ)

(24)

For ﬁxed D^s, the values of β and σ² that maximize (2.2.1) are given by β(D^s) =

X^T I +ZD_A^sZ^T⁻¹X⁻¹X^TI +ZD_A^sZ^T⁻¹y (2.2.2)

σ²(D^s) = (1/n)y−Xβ(D^s)^TI +ZD_A^sZ^T⁻¹y−Xβ(D^s) Restricted maximum likelihood estimates (RMLEs) of the variance-covariance components are usually preferred to maximum likelihood estimates (MLEs) in linear mixed effects models. The basic reason for that being that RMLEs take into account the estimation of the fixed effects when calculating the degrees of freedomassociated to the variance-components estimates, while MLEs do not.

The RMLEs are defined as the MLEs of the likelihood of a set of n −p₀ linear combinations of the response vector y, corresponding to n−p₀ vectors that span the orthogonal complement of the column space of the fixed effects design matrix X (Harville, 1974). One way of defining such a set of vectors is to consider the QR decomposition (Thisted, 1988) of X

X = [Q₁ Q₂]



 R1

0



 (2.2.3)

where R1 is upper triangular. It follows from the definition of the QR decomposition that the columns of Q₂ define a set of orthonormal vectors that span the orthogonal complement of the column space of X and the RMLEs can be obtained from the likelihood of y^∗ = Q^T₂y. From elementary properties of the multivariate normal distribution and the definition of the QR decomposition, y^∗ ∼ N(0,Σ^∗), where Σ^∗ =σ²I+Q^T₂ZD_A^sZ^TQ₂. Letting Z^∗ =Q^T₂Z and

(25)

n^∗ =n−p₀, we can write the corresponding restricted likelihood as _Rβ, σ²,D^s |y = −1

2

n^∗log2πσ²+ logI +Z^∗D_A^sZ^∗T

(2.2.4) + 1

σ²y^∗T I+Z^∗D_A^sZ^∗T⁻¹y^∗ For ﬁxed D^s, the value ofσ² that maximizes (2.2.4) is

σ²_R(D^s) = (1/n^∗)y^∗T I+Z^∗D_A^sZ^∗T⁻¹y^∗ (2.2.5) The restricted likelihood (2.2.4) does not depend upon β and hence no fixed effects RMLEs are available. Nevertheless the first formula in (2.2.2), with D^s replaced by its corresponding RMLE, is usually employed to provide estimates for the fixed effects in restricted maximum likelihood estimation.

The (R)MLE ofD^sin general does not have a closed form expression and its determination constitutes a constrained nonlinear optimization problem whose numerical solution has beeen addressed in several papers (Hartley and Rao, 1967; Laird and Ware, 1982; Lindstrom and Bates, 1988; Wolﬁnger, Tobias and Sall, 1991). We will not consider the numerical problem of determining the (R)MLE of D^s in this dissertation. Using the formulas in (2.2.2) and (2.2.5) one can express the likelihood (2.2.1), or the restricted likelihood (2.2.4), as a function of D^s alone, greatly simplifying the optimization problem.

The exact distribution of the (R)MLEs cannot be derived in most applications of model (2.1.1) and inference about them usually has to rely on asymptotic results. We derive, in chapter 3, the asymptotic distribution of both the MLE and the RMLE, under quite general regularity conditions.

In many applications of linear mixed eﬀects models, estimates of the random

(26)

effects b are also of interest. In (R)ML estimation the conditional modes of the random effects are frequently used for that purpose (Lindstrom and Bates, 1988). These are defined as the mode of the conditional distribution of b given y, which in the case of maximum likelihood estimation is given by

bM L =D_{A,M L}^s Z^T I+ZD_{A,M L}^s Z^T⁻¹y−Xβ_{M L}

and in the case of restricted maximum likelihood is given by

bRM L=D_{A,RM L}^s Z^T I+ZD_{A,RM L}^s Z^T⁻¹y−Xβ_{RM L}

where D_{A,M L}^s , D_{A,RM L}^s , and β_{M L} denote respectively the MLE and RMLE of D_A^s, and the MLE of β.

2.3 Bibliographic Review

The first developments of linear mixed effects models were related to the so called variance components models, defined as linear mixed effects models in which all random effects are independent (and hence no covariance components are present). Airy (1861) seems to have given the first known formulation of a variance components model while considering a standard measurement problem in astronomy.

Fisher (1925) introduced the ANOVA method for estimating variance components (i.e. equating sum of squares to their expected values). Tippet (1931) clarified the use of the ANOVA method for analysis of variance designs and extended it to 2-way crossed classification mixed effects models. Possibly the most important paper in ANOVA estimation for unbalanced data is Henderson

(27)

(1953). The three ANOVA methods presented in that paper, later known as Henderson methods, were the standard estimation methods for linear mixed eﬀects models until fast computers became available.

Maximum likelihood estimation for normal distribution variance components models seems to have been first considered by Crump (1947). The landmark paper on ML estimation for variance components models is Hartley and Rao (1967), in which, among other things, the first asymptotic results for the MLE were established. Miller (1977) corrected some problems in Hartley and Rao’s results and established asymptotic results for a large class of variance component models, giving also conditions for them to hold. Restricted maximum likelihood was introduced by Thompson (1962) and later extended by Patterson and Thompson (1971). Harville (1977) presents a comprehensive review of maximum likelihood and restricted maximum likelihood estimation in linear mixed effects models and introduces the model formulation given in (2.1.1). Laird and Ware (1982) describe a general linear mixed effects model for repeated measures data and suggest the use of the EM algorithm for obtaining (R)MLEs of the variance-covariance components.

The general structure of the linear mixed effects model (2.1.1) seems to be accepted by most researchers today. The linear mixed effects models literature that has been published after (Harville, 1977) and Laird and Ware (1982) refers more to generalizations of the assumptions in model (2.1.1) and/or to different estimation approaches, than to reformulations of the basic model’s structure.

Chi and Reinsel (1989) consider model (2.1.1) whenΛhas the structure of an autoregressive process of order one (AR(1)). Maximum likelihood estimators of the model parameter and a score test for the autocorrelation are derived. One of the main conclusions is that the use of a AR(1) structure for the cluster-speciﬁc

(28)

errors may have the effect of reducing the number of random effects needed in the model, but the investigation of ways to determine the best combination of time series error structure and number of random effects deserves further study.

This issue is also considered by Jones (1990).

A Bayesian analysis of model (2.1.1) using the Gibbs sampler (Geman and Geman, 1984) is described in Gelfand, Hills, Racine-Poon and Smith (1990) and in Wakefield, Smith, Racine-Poon and Gelfand (1994). The Bayesian analysis is developed using a hierarchical model approach. In the second paper the normal distribution of the random effects (b) is replaced by a multivariate Student-t, enhancing the robustness of the fit and giving a method for detecting outlying random effects. The main advantage of this approach is its flexibility in handling complex situations, such as constrained parameters and non-Gaussian distributions for the random effects and/or error terms. The main drawbacks are the intensive computational effort required and the need for prior distributions for all the population parameters involved.

Jennrich and Schluchter (1986) consider ML estimation in linear mixed effects models for repeated measures with structured variance-covariance matrices. Their work was extended to the general linear mixed effects models by Wolfinger et al. (1991), who also discuss restricted maximum likelihood. The use of structured matrices is very appealing in practice since many times it is known beforehand that the covariance structure of the random effects and/or the errors follows a particular pattern, and substantial reductions in computing time can thus be achieved.

A generalized linear model version of (2.1.1) is discussed in Liang and Zeger (1986) and Zeger, Liang and Albert (1988). They allow a more ﬂexible error structure that is no longer restricted to being Gaussian and introduce the idea of

(29)

a link function,h, relatingE(y|b) toβandb, so thath(E(y|b)) = Xβ+Zb. This model should in fact be considered a competitor of the nonlinear mixed eﬀects model, discussed in chapter 4.

Three books solely dedicated to linear mixed effects models have been re- cently published. Searle et al. (1992) includes a comprehensive review of models and estimation methods for linear mixed effects models, but focuses more on variance components models and mixed effects ANOVA models. Lindsey (1993) covers in detail linear mixed effects models for repeated measures data and Longford (1993) considers linear mixed effects models in a regression context.

(30)

Asymptotic Results for the Linear Mixed Eﬀects Model

Miller (1977) derived the asymptotic distribution of maximum likelihood estimators for a mixed effects ANOVA model. In section 3.1 we extend these results to the more general linear mixed effects model (2.1.1), showing that, under fairly general conditions, with probability going to one there exists a se- quence of roots of the likelihood equations that is consistent and asymptotically normal. These results are helpful in establishing the asymptotic uncorrelation of the estimators of the fixed effects and the estimators of the variance-covariance components. We also show, in section 3.2, that under fairly general conditions the restricted maximum likelihood estimators for the general linear mixed effects model are consistent and asymptotically normal. In section 3.3, we show that the asymptotic normality of the (restricted) maximum likelihood estimators continues to hold for a large class of reparametrizations/structuring of the variance-covariance components. Our conclusions are included in section 3.4.

(31)

The proofs of the lemmas used throughout this chapter are included in Ap- pendix A.

3.1 Maximum Likelihood

Under Assumption 2.1.1 the linear mixed eﬀects model (2.1.1) can alternatively be expressed as

y =Xβ+

r i=1

qi

j=1

U^j_ia^j_i + (3.1.1)

where the U^j_i are n×m_i incidence-like matrices deﬁned by the relation kth column of U^j_i =jth column of Z^k_i.

Note that eachU^j_i has at most one nonzero entry per row. We will assume here that it has at least one nonzero entry per column, to rule out trivial cases. The a^j_i vectors are deﬁned by the relationa^j_i

k = b^k_i

j and represent the values of the jth random eﬀect of theith class.

The model formulation (3.1.1) is analogous to that of Hartley and Rao (1967) and Miller (1977) for the mixed eﬀects ANOVA model. We will use it in this chapter to maintain consistency with the terminology used in the second paper.

The covariance matrix of y can be expressed as Σ=σ²I +

r i=1

qi

j,k=1

[Di]_jkU^j_i(U^k_i)^T.

By letting p₁ =^r_i=1q_i(q_i+ 1)/2, σ₀ =σ² and G0 =I and setting

σ₁ = [D1]₁₁, σ₂ = [D1]₁₂,· · ·, σ_q₁_(q₁_+1)/2+1= [D2]₁₁,· · ·, σ_p₁ = [Dr]_q_r_q_r

(32)

G1 = U¹₁(U¹₁)^T,G2 =U¹₁(U²₁)^T +U²₁(U¹₁)^T,· · ·,Gp1 =U^q_r^r(U^q_r^r)^T,

we can write

Σ=

p1

i=0

σ_iGi. (3.1.2)

This formulation of model (2.1.1) diﬀers from that in Miller (1977) in that some of the σ_i may assume negative values and some of the G_i are not required to be positive semi-deﬁnite.

The following assumptions (equivalent to Assumptions 2.2 through 2.5 in Miller (1977)) are made about model (3.1.1).

Assumption 3.1.1 The matrix X is of full rank p₀. Assumption 3.1.2 n≥ p₀+p₁+ 1.

Assumption 3.1.3 The partitioned matrix X :U^j_i has rank greater than p₀ for i= 1, . . . , r, j = 1, . . . , q_i.

Assumption 3.1.4 The matrices G0, G1, . . ., Gp1 are linearly independent, i.e. ^p_i=0¹ τ_iGi =0⇐⇒τ_i = 0, i= 0, . . . , p₁.

As mentioned in Miller (1977), Assumption 3.1.1 can always be satisfied by suitably reparametrizing the fixed effects vector. Assumptions 3.1.3 and 3.1.4 ensure that the random effects are not confounded with the fixed effects and with each other.

Let p = p₀+p₁ + 1 and σ = (σ₀, σ₁, . . . , σ_p₁)^T. Then the parameter space Θ for model (3.1.1) is

Θ =

θ ∈ ^p |θ=β^T,σ^T^T , β∈ ^p⁰; σ₀ >0 and (σ₁, . . . , σ_p₁)∈ ^p¹ such that eachD_i is positive semi-deﬁnite, i= 1, . . . , r}.