A Appendix - Conditional Inference for Multivariate Generalised

Paper I Conditional Inference for Multivariate Generalised

I. A Appendix

I.A.2 Technical Proofs of the Asymptotic Distribution of the Conditional Inference Based Estimates

In this appendix, we present a sequence of lemmas and propositions that will culminate with the proof of the Theorem I.2.1, which establishes consistency and joint asymptotic normality of the proposed estimator of β and the predictor ofb for small values of the variance of the random components.

Regular Inference Functions

We recall the definition of regular inference functions used in this appendix for the easy of the reader (see the details in Jørgensen & Labouriau, 2012, Chapter 4, from which we draw heavily). Consider a parametric family of distributions P = {P_θ : θ ∈ Θ ⊆ R^k} and a σ-finite measure µ defined on a given measurable space (X,A). For eachP_θ ∈ P, we chose a version of the Radon-Nikodym derivative (with respect to µ), denoted by

p(·;θ) = dP_θ dµ(·).

Definition 1. A function Ψ : X ×Θ −→ R^k is said to be a regular inference function when the following conditions are satisfied for all θ= (θ₁, . . . , θ_k)∈Θ and for i, j = 1, . . . , k.

(i) Eθ[Ψ(θ)] = 0;

(ii) The partial derivative ∂Ψ(x;θ)/∂θ_i exists for µ-almost every x∈ X;

(iii) The order of integration and differentiation may be interchanged as follows:

∂

∂θ_i

Ψ(x;θ)p(x;θ)dµ(x) =

∂

∂θ_i[Ψ(x;θ)p(x;θ)]dµ(x) ; (iv) E{ψ_i(θ)ψ_j(θ)} ∈R and the k×k matrix

V_ψ(θ) = E{Ψ(θ)Ψ^T(θ)}

is positive definite;

(v) E{^∂ψ_∂θⁱ

r(θ)^∂ψ_∂θ^j

s(θ)} ∈R and the k×k matrix Sψ(θ) = E{∇θΨ(θ)}

is nonsingular.

Here ψ_i denoted the i^th component of the vector function Ψ(·) = (ψ₁(·), . . . , ψk(·))^T ,

and ∇_θ denotes the gradient operator relative to the vector θ, defined by

∇_θf(θ) = ∂f

∂θ^T(θ).

Some Key Lemmas

We denote the sequences of roots of the inference functionsψ^∗_β and ψ_b^∗ by{β^b_n}n∈N

and {b^b_n}_n∈_N respectively, obtained when the number of observations,n, increases.

Moreover, define θ = (β,b) and θ^b_n = (β^b_n,^bb_n) (for each n ∈ N). Recall, that the inference function ψ^∗ : Ω×R^q× Y →R^k+q for estimating θ under P^∗, is defined by

ψ^∗(β,b) =

ψ_β^∗(β,b)ⁱ^T ,[ψ_b^∗(β,b)]^T

, for all β∈Ω and b∈R^q.

Lemma I.A.2. Under the regularity conditions i-iv, the partial inference functions ψ^∗_β and ψ^∗_b are unbiased, that is,

EP_β,b,λ^∗

hψ_β^∗(β,b;Y)ⁱ=0, E^P_β,b,λ^∗ [ψ^∗_b(β,b;Y)] = 0,

for all β ∈Ω, b ∈R^q and λ∈Λ. Moreover, the partial inference functions ψ_β^∗ and ψ^∗_b, are regular.

Proof. We show that ψ_β^∗ is unbiased since the unbiasedness of ψ_b^∗ follows from the same arguments. Take arbitrarily β∈Ω,b ∈R^q and λ ∈Λ. We aim to show that

0 =

Yψ_β^∗(β,b;y)f^∗(y;β,b, λ)dν(y).

The regularity conditions ensure that it is allowed to interchange the order of differentiation and integration in the following:

Yψ^∗_β(β,b;y)f^∗(y;β,b, λ)dν(y)

i=1

∂

∂β{d(y_i;g⁻¹(x^T_i β+ ˜z_i^Tb))}f^∗(y_i;β,b, λ)dν(y_i)

=−2λ

i=1

∂

∂βf^∗(y_i;β,b, λ)dν(y_i)

=−2λ

i=1

∂

∂β

Yf^∗(y_i;β,b, λ)dν(y_i) = 0.

The proof follows sinceβ ∈Ω,b ∈R^q and λ∈Λ are arbitrarily chosen.

The other regularity conditions for the inference functions follow straightforwardly from the assumed regularity conditions i-iv for the GLMM in play.

We introduce some required notation before presenting the next lemma. Define the sensitivity block matrices

S_βb =E[∇_bψ^∗_β(β,b;Y)], S_bβ =E[∇_βψ^∗_b(β,b;Y)], S_bb =E[∇_bψ_b^∗(β,b;Y)], S_ββ =E[∇_βψ_β^∗(β,b;Y)],

and the variability matrices

V_βb^∗ =E[ψ^∗_b(β,b;Y)ψ^∗_β(β,b;Y)^T], V_b^∗_β =E[ψ^∗_β(β,b;Y)ψ_b^∗(β,b;Y)^T], V_bb =E[ψ^∗_b(β,b;Y)ψ^∗_b(β,b;Y)^T], V_ββ =E[ψ^∗_β(β,b;Y)ψ_β^∗(β,b;Y)^T].

Using these, we define

W =D⁻¹ =S_bb−S_βbS_ββ⁻¹S_bβ, A=S_ββ⁻¹+S_ββ⁻¹S_bβW⁻¹S_βbS_ββ⁻¹, E =−S_ββ⁻¹S_bβW⁻¹, C =−W⁻¹S_βbS_ββ⁻¹.

Lemma I.A.3. The inverse Godambde information for the inference function ψ^∗ is the matrix-valued function J_ψ⁻¹∗ : Ω×R^q→R(k+q)×(k+q) defined by

J_ψ⁻¹∗ =





J_ψ⁻¹^∗

β (J_ψ⁻¹∗ βb)^T J_ψ⁻¹∗

βb J_ψ⁻¹∗ b



,

with

J_ψ⁻¹^∗

β =AVββA^T +EVβbA^T +AVb^∗βE^T +EVbbE^T J_ψ⁻¹^∗

βb =CV_ββA^T +DV_βbA^T +CV_bβE^T +DV_bbE^T J_ψ⁻¹^∗

b =CV_ββC^T +DV_βb^∗C^T +CV_bβD^T +DV_bbD^T. for all β∈Ω and b ∈R^q using the above introduced notation.

Proof. The result follows from the formulas in Chapter 4 in Jørgensen & Labouriau (2012) and inversion of block matrices.

Lemma I.A.4. Assume the regularity conditions i-iv. Then, for all β ∈Ω, b∈R^q and λ∈Λ, it is true that

βb_n ^P

∗ β,b,λ

−−−→n→∞ β and ^bb_n ^P

∗ β,b,λ

−−−→n→∞ b. Moreover,

√n(θ^b_n−θ)|B =b −−−→^D

n→∞ N_k+q(0, J_ψ⁻¹∗(β,b)), implying that

√n(β^b_n−β)|B =b −−−→^D

n→∞ N_k(0, J_ψ⁻¹^∗

β(β,b)) and

√n(b^b_n−b)|B =b−−−→^D

n→∞ N_q(0, J_ψ⁻¹^∗

b(β,b)).

Proof. The proof follows from the results in Chapter 4 in Jørgensen & Labouriau (2012), and the fact that ψ_β^∗ andψ^∗_b are regular inference functions as a consequence

of Lemma I.A.2.

On the asymptotic variance of θ^b_n under the family P

Lemma I.A.5. Assume the regularity conditions i-iv. The partial solution {β^bn}n∈N

ofψ_β^∗ is also a solution toψ_β = 0defined in (I.12), and the unconditionally asymptotic covariance matrices (forn converging to infinity and q fixed), denoted AV, of β^b_n and bb_n are given by

AV (β^b_n) = E[J_ψ⁻¹^∗

β(β,B)] +V[β^b_n(B)], (I.15) AV (b^b_n) = E[J_ψ⁻¹^∗

b(β,B)] +I_qσ², (I.16) with β^bn(B) denoting the estimator of β as a function of B for all n ∈ N, β ∈ Ω and B ∈R^q. Moreover,

βbn P_β,λ,σ2

−−−−→

n→∞ β,

for all β∈Ω, λ∈Λ and σ² ∈R+.

Proof. If β^b_n is a solution to (I.12) then it is also a solution to (I.10) when inserting βb_n and ^bb_n for a given n∈N.

Take β∈Ω, λ∈Λ and σ² ∈R+ arbitrarily. The asymptotic covariance matrices follows from the law of total variance and Lemma I.A.4, which also implies that for all ϵ >0

P_β,λ,σ²(|β^b_n−β|> ϵ)

R^q

P_β,b,λ^∗ |β^b_n−β|> ϵB=b

j=1

φ(b_j;σ²)db−−−→

n→∞ 0, since P_β,b,λ^∗ (|β^b_n−β|> ϵB=b) −→

n→∞0 for all ϵ >0 and b ∈R^q. By the regularity assumptions i-iv, we can interchange the order of limit and integration. The proof follows since β ∈Ω, λ∈Λ and σ² ∈R+ are arbitrarily chosen.

Often the distribution of the random components can be easily simulated in a computational efficient way (e.g., when the random components are normally or t- distributed). In those cases, the expectations and variances referred in (I.15) and (I.16) can be easily obtained using Monte Carlo methods (this includes simulations

of B and calculations of estimates of β as a function of the simulated values).

Proof of the Theorem I.2.1

The lemma below provides the calculation of the characteristic function of the asymptotic distribution of the sequence of estimated values of β^b_n and ^bb_n, which will be crucial to prove Theorem I.2.1.

Lemma I.A.6. Assume the regularity conditions i-iv. There exist two random vectors Z_β and Z_b with characteristic functions

E[exp(it^T₁Zβ)] = E[exp(−¹₂t^T₁J_ψ⁻¹^∗

β(β,B)t₁)], for all t₁ ∈R^k, E[exp(it^T₂Z_b)] = E[exp(−¹₂t^T₂J_ψ⁻¹^∗

b (β,B)t₂)], for all t₂ ∈R^q, respectively, such that

√n(β^b_n−β)−−−→^D

n→∞ Z_β and √

n(b^b_n−b)−−−→^D

n→∞ Z_b. Proof. By Lemma I.A.4 we have that

√n(β^b_n−β)|B=b −→^D

n→∞ N_k(0, J_ψ⁻¹∗

β(β,b).

Let Z_β denote a random variable distributed according to the above defined conditional asymptotically Gaussian distribution. By the Portmanteau theorem the above is equivalent to

h√

n(β^b_n−β)|B =b

n→∞−→ E

hh(Z_β)|B =bⁱ for all continuous bounded functions h. Thus, we have that

h√

n(β^b_n−β)

R^q

h√

n(β^b_n−β)|B=b

^q Y

j=1

φ(b_j;σ²)db −→

n→∞

R^q

E[h(Z_β)|B=b]

j=1

φ(b_j;σ²)db

=E[h(Z_β)],

since we can interchange the order of limit and integration due to the assumed regularity conditions. Therefore, we conclude that

√n(β^b_n−β) −→^D

n→∞Z_β. The characteristic function of Z_β is given by:

E[exp(it^T₁Z_β)] =E[E[exp(it^T₁Z_β)|B]]

=E[exp(−¹₂t^T₁J_ψ⁻¹^∗

β(β,B)t₁)], for all t₁ ∈R^k.

The proof for ^bbn follows by similar arguments by changing β^bn tob^bn, and Zβ toZb

(by changing J_ψ⁻¹^∗

β(β,B) to J_ψ⁻¹^∗

b (β,B)) in the above.

The theorem below corresponds to the second part of Theorem I.2.1.

Theorem I.A.7. Under the regularity conditions i-iv, the sequences {βˆ_n}n∈N and {b^b_n}n∈N are asymptotically Gaussian distributed, when n→ ∞ and σ² →0+ in the following way

√n(β^bn−β)−−−−→_n→∞^D

σ²→0+

Nk(0, J_ψ⁻¹∗

β(β,0)), and

√n(b^b_n−b)−−−−→_n→∞^D

σ²→0+

N_q(0, J_ψ⁻¹^∗

b (β,0)).

Proof. Consider the characteristic function of Z_β found in Lemma I.A.6:

E[exp(it^TZβ)] =E[exp(−¹₂t^TJ_ψ⁻¹^∗

β(β,B)t)], for all t∈R^k. (I.17) Using a first order Taylor approximation, we find that

exp

−¹₂t^TJ_ψ⁻¹^∗

β(β,B)t

= exp

−¹₂

i=1 k

j=1

t_it_j{J_ψ⁻¹^∗

β(β,B)}_ij

= exp

−¹₂

i=1 k

j=1

t_it_j{J_ψ⁻¹^∗

β(β,0)}_ij

+ exp(−¹₂

i=1 k

j=1

t_it_j{J_ψ⁻¹^∗

β(β,0)}_ij)

×−¹₂B^T

i=1 k

j=1

t_it_j

∂{J_ψ⁻¹^∗

β}_ij

∂b (β,0)

+R(B), for all t∈R^k,

where R(·) is the remainder term which converges to zero when B converges to zero.

Thus, forσ² converging to zero, B converges to the expectation which is zero. This imply, that the remainder term converges to zero. Notice, that the second term has expectation zero since E[B] = 0, so inserting the above in (I.17) yields

EZβ[exp(it^TZ_β)] = exp(−¹₂t^TJ_ψ⁻¹^∗

β(β,0)t) +R(B) −→

σ²→0+exp(−¹₂t^TJ_ψ⁻¹^∗

β(β,0)t).

This proves that the asymptotically distribution of{βˆ_n}_n∈_N converges to a Gaussian distribution when σ² converges to zero. The argument for {ˆbn}n∈N is equivalent and follows by changing ˆβ_n to ˆb_n and Z_β toZ_b (changing J_ψ⁻¹^∗

β(β,B) toJ_ψ⁻¹^∗

b (β,B)) in the above.

I.A.3 Variance Estimation in for Models with Gaussian Random Components

In this section, we calculate the integral in Equation (I.13) under the assumption that

g(b;^b b,Σ

b^b

) =|2πΣ

b^b

|⁻¹² exp−¹₂(b^b−b)^TΣ⁻¹

b^b

(b^b−b) φ(b;σ²) = (2πσ²)⁻¹² exp(−_2σ¹2b²).

Plugging into the integral yields

R^q

g(ˆb;b,Σˆb)φ(b; (I_q−¹_qE_q)σ²)db

R^q

|2πΣ

bb|⁻¹² exp−¹₂(b^b−b)^TΣ⁻¹

b^b

(^bb−b)|2πσ²I_q|⁻¹² exp(−¹₂b^T_σ¹2I_qb)db

=|2πΣ

b^bk|⁻¹²(2πσ²)⁻

2 exp−¹₂^bb^TΣ⁻¹

b^b

bb2π[Σ⁻¹

b^b

+ _σ¹2I_q]⁻¹

1 2

×exp¹₂b^b^TΣ⁻¹

bb [Σ⁻¹

bb +_σ¹₂Iq]⁻¹Σ⁻¹

bb.

In the case of multiple random components, we maximise the integral above for each random component. If the random components are nested, we only predict values for the random components with the highest number of clusters and then uses least squares to predict values for each random component, see Section I.2.5.

Therefore, the calculations above are changed by replacingσ²I_q with^P^K_j=1σ²_B_jC_jC_j^T, where B1, . . . ,BK denotes the K ∈ N nested random components, and Cm the q×q_m dimensional matrix specifying for each level l (l^th row) which entry of B_m that enters the lth entry of ^bb. Here q_m is the dimension of the random vector B_m.

In the multivariate model described in Section I.3.1, the above integral can be adapted by letting ^bb^T = (^bb^T₍₁₎, . . . ,^bb^T_(d)) (and thus changing the dimension of Σ

b^b

) and replacing σ²I_q with Σ⊗I_q.

I.A.4 Multivariate Extension of the Laplace Approximation Method

We outline how the Laplace approximation in Breslow & Clayton (1993) can be extended to the multivariate model described in Section I.3.1, when the random components follow a multivariate Gaussian distribution. This extension follows directly from Breslow & Clayton (1993) by redefining some matrices and vectors.

We shortly describe how this was done in the simulation study in Section I.3.2.

The extension given below assumes that the marginal GLMMs are defined with exponential dispersion models (as in Breslow & Clayton (1993)) but this can easily be extended to include general dispersion models.

We assume that B₁, . . . ,B_q are i.i.d according to a d-dimensional Gaussian distribution with zero mean and covariance matrix Σ. Let B_(j) denote a vector containing all the j^th entries ofB₁, . . . ,B_q forj = 1, . . . , d. The above distributional assumptions implies that ˜B^T =^h(B₍₁₎)^T, . . . ,(B_(d))^Tⁱ is Gaussian distributed with

mean zero and covariance matrix Σ ⊗ I_q, where I_q is the q×q-dimensional identity matrix and ⊗denotes the Kronecker product.

Recall that the i^th (i = 1, . . . , n_j) response in the j^th (j = 1, . . . , d) marginal model was denoted y^[j]_i . Define for j = 1, . . . , d, the n_j × k_j-dimensional matrix X^[j]= [x_1j, . . . ,x_n_j_j]^T, and likewise the n_j ×q matrix Z^[j]= [z_1j, . . . ,z_n_j_j]^T. Based on these definitions, we define for k = k₁ +· · ·+k_d and n = n₁ +· · ·+n_d, the n × k-dimensional matrix X = diag[X^[1], . . . ,X^[d]] and the n× dq-dimensional matrix Z = diag[Z^[1], . . . ,Z^[d]]. Moreover, we define for each dimension j = 1, . . . , d, the n_j ×n_j-dimensional diagonal glm weight matrix W^[j] with diagonal entries w^[j]_ii = _2λ¹

Vj(µ^[j]_i )g^′_j(µ^[j]_i )², and the n×n matrix W = diag[W^[1], . . . ,W^[d]].

By redefining the matrices X, Z, W, D = Σ ⊗ I_q and the vectors B˜ and y^T = (y₁^[1], . . . , y_n^[d]

d), we can use the Laplace approximation in Breslow & Clayton (1993) to estimate the multivariate model.

Paper II

Multivariate Generalised Linear Mixed Models With Graphical Latent Covariance Structure

Jeanett S. Pelck

Aarhus University

Rodrigo Labouriau

Aarhus University

Abstract. This paper introduces a method for studying the correlation structure of a range of responses modelled by a multivariate generalised linear mixed model (MGLMM). The methodology requires the existence of clusters of observations and that each of the several responses studied is modelled using a generalised linear mixed models (GLMM) containing random components representing the clusters.

We construct a MGLMM by assuming that the distribution of each of the random components representing the clusters is the marginal distribution of a (sufficiently regular) multivariate elliptically contoured distribution. We use an undirected graphical model to represent the correlation structure of the random components representing the clusters of observations for each response. This representation allows us to draw conclusions regarding unknown underlying determining factors related to the clusters of observations. Using a combination of an undirected graph and a directed acyclic graph (DAG), we jointly represent the correlation structure of the responses and the related random components. Applying the theory of graphical models allows us to describe and draw conclusions on the correlation and, in some cases, the dependence between responses of different statistical nature (e.g.,following different distributions, different linear predictors and link functions). We present some simulation studies illustrating the proposed methodology.

II.1 Introduction

This paper introduces a method for studying the dependence structure of a range of responses modelled by a multivariate generalised linear mixed model (MGLMM) (see Pelck & Labouriau, 2021a, for details). The methodology we suggest requires the existence of clusters of observations (or experimental units) and that each of the responses studied is modelled using a GLMM containing random components representing the clusters. We will construct an MGLMM by assuming that the distribution of each of the random components representing the clusters is the

marginal distribution of a sufficiently regular multivariate elliptically contoured distribution (see Anderson, 2003). This choice of the distribution of the random components includes, as a particular case, the multivariate normal distribution used in standard generalised linear mixed models (GLMMs), as the models considered in Breslow & Clayton (1993), McCulloch & Searle (2001), McCulloch (1997).

We will use an undirected graphical model (see Lauritzen 1996, Whittaker 1990, Abreu et al. 2010) to represent the correlation structure of the random components representing the clusters of observations for each response. This representation will allow us to conclude on unknown underlying determining factors related to the clusters of observations. Furthermore, using a combination of an undirected graph and a directed acyclic graph (DAG), we jointly represent the correlation structure of the responses and the related random components. This representation arises naturally from the construction of the MGLMM we use and yields a known type of graphical model, namely a block chain independence graph (BCG) as defined in Whittaker (1990). Remarkably, this construction will allow us to describe and draw conclusions on the correlation between the responses of different statistical nature, e.g., responses modelled with GLMMs defined using various combinations of distributions, linear predictors and link functions (not necessarily the same for each response).

We will base the inference for the proposed graphical models on variants of tests for correlation under multivariate normality or multivariate elliptically contoured distributions studied in detail in Anderson (2003), from which we draw heavily. In the particular case where the random components are multivariate normally distributed, non-correlation will imply independence, which makes the conclusions of the analysis stronger. When the random components are not normally distributed but follow an elliptically contoured distribution, we will obtain slightly weaker conclusions since, in that case, lack of correlation implies only mean independence. ¹

The paper is organised as follows. In Section II.2.1, we formulate a version of a multivariate generalised linear mixed model. For simplicity, we only specify one common clustering structure but the theory can easily be extended to include multiple clustering. In Section II.3.1, we introduce essential concepts of graphical models that we will use to study the covariance structure of the random components and the response variables. These concepts are connected to the introduced multivariate model in Section II.3.2. In Section II.3.3, we describe statistical tests adapted from Anderson (2003) and draw the connection to the theory of graphical models. In Section II.3.4, we perform a simulation study to study the distribution of the p-values in the simulated examples under the null hypothesis obtained using the statistical tests. Moreover, we study the power of the tests in the simulated examples on a grid consisting of values of the off-diagonal entry in the covariance matrix. Some concluding remarks are given in Section II.4. Appendix II.A.1 discusses how an estimate of the covariance matrix can be obtained based on consistent predictions

1Recall that a random vectorX is mean independent of the random vectorY whenE(X|Y = y) =E(X) for ally in the support of the distribution ofY. It is well known that independence implies mean independence which implies non-correlation, but the reversed implications are in general not valid, see Wooldridge (2010).

of the random components. Appendix II.A.2 presents details on how the density of the introduced test statistic can be evaluated in the case of Gaussian random components.

II.2 Multivariate Generalised Linear Mixed Models

In this section, we formulate a version of the multivariate generalised linear mixed model described in Pelck & Labouriau (2021a). These models are based on marginal GLMMs that extend the standard GLMMs in two directions: we assume the random components to be distributed according to an elliptical contoured distribution, instead of following a multivariate Gaussian distribution, and we assume the conditional distributions of each response, given the random components, to belong to a dispersion model, instead of an exponential dispersion model. The MGLMMs we will define require, however, the existence of clusters of observations and the presence of random components representing those clusters in each of the marginal GLMMs representing the responses. The result of this process is a rather flexible class of models that can be used in many practical applications; see for example Pelck & Labouriau (2020, 2021c), Pelck et al. (2021a,b).

II.2.1 Model Definition

We define a d-dimensional multivariate generalised linear mixed model (i.e., a MGLMM representing d responses) with n observations of the j^th marginal model, taking values in Y_j ⊆ R, for j = 1, . . . , d. Here Y_j is typically R, R+, a compact real interval or Z+. Denote by Y_i^[j] the random variable representing the i^th observation of the j^th response, for j = 1, . . . , d and i = 1, . . . , n. We assume that there exists a natural clustering of the observations causing dependence between observations arising from the same cluster (e.g., grouping of observations within the same individual). We denote the cluster of the i^th observation, Y_i^[j], byc(i) taking one of the values 1, . . . , q. Moreover, we assume that the clustering of the responses is independent of d, that is, the clusters are represented in each marginal model. To ease the notation throughout the paper, we only consider one clustering mechanism but the methodology can be applied to a model with multiple clustering structures (e.g., Pelck et al. (2021a)). Likewise, we assume the same number of observations for each response to simplify the notation. However, the methods described in this paper applies also to the case with n depending on j.

In each marginal model, we consider random components each taking the same value for all responses within the corresponding cluster. These random components are denoted by B_c(i)^[j] for i= 1, . . . , n andj = 1, . . . , d. Define the d-dimensional random vectors of random components, taking values in R^q, by B^[j]= (B₁^[j], . . . , B_q^[j])^T, the vector of responses Y^[j]= (Y₁^[j], . . . , Y_n^[j]) and the vector of realisationsY^[j] byy^[j]

(denoted observations) of for j = 1, . . . , d. Moreover, consider the d-dimensional

random vector B_l= (B_l^[1], . . . , B_l^[d])^T which we assume to be elliptically contoured distributed (Anderson 2003) satisfying the following regularity conditions, for l = 1, . . . , q,

1. The moments up to fourth order of each marginal distribution exist

2. Each of the marginal distributions is absolute continuous with respect to the Lebesgue measure

3. All the conditional distributions exist and are elliptically contoured distributions also

4. The location parameter vector is equal to zero.

Furthermore, we assume that B_l is independent ofB_l^′ for l, l^′ = 1, . . . , q such that l ̸= l^′, i.e., we assume that the random vectors representing different clusters are independent. We define the density with respect to the Lebesgue measure of the elliptically contoured distribution by

φ(b;Λ) = |Λ|^−1/2hb^TΛ⁻¹b, (II.1) whereΛ is a positive definite scatter matrix. The function h(·) is non-negative and satisfies that

R^q

h(b^Tb)db= 1. (II.2)

When the density exists, the covariance matrix, Σ, is proportional to Λ, i.e., the correlation matrix can be equivalently calculated from both ΣandΛ. An example of a commonly used distribution satisfying these regularity conditions is a multivariate Gaussian distribution with expectation zero and covariance matrix given by Σ.

Another example, that we will study later is the multivariate t-distribution. This distribution allows us to consider different degrees of tail heaviness. Note that because the moments of fourth order must exist in the multivariate t-distribution, the degrees of freedom should be larger than four.

According to the model, we assume thatY_i^[j] is conditional distributed according to a dispersion model with dispersion parameter λ_j ∈ R+ given B_c(i)^[j] , and with conditional expectation

g_jµ^[j]_i (b)^def=g_j(E[Y_i^[j]|B_c(i)^[j] =b]) =β_j^Tx^[j]_i +b, ∀b∈R,

for all i = 1, . . . , n and j = 1, . . . , d. The vector x^[j]_i is a p_j dimensional vector of explanatory variables corresponding to the vector of coefficients,β_j. The explanatory variables might differ for the different responses. The functiong_j(·) is a given link function, which is assumed to be strictly monotone, invertible and continuously differentiable. Below, we will suppress the dependence in µ^[j]_i (b) of b to lighten the notation and denote the parameter space of the conditional means by U_j. We define the conditional density corresponding to the conditional distribution of Y_i^[j] given

B_c(i)^[j] = b with respect to a domination measureν_j (defined on the measurable space (Y_j,A)) by

f(y_i^[j]|B^[j]_c(i) =b;β^[j], λ_j)^def=p(y_i^[j];µ^[j]_i , λ_j) =a_j(y_i^[j];λ_j) exp[−_2λ¹

j d_j(y_i^[j];µ^[j]_i )]. (II.3) The function dj : Yj × Uj → R+ is the unit deviance and, by definition, satisfies that d_j(µ, µ) = 0 and d_j(y, µ) > 0 for all (y, µ) ∈ Y_j × U_j such that y ̸= µ.

The function a_j : Y_j × R+ → R+ is a given normalising function. We assume that the unit deviance is regular, that is, d is twice continuously differentiable in Y_j × U_j and∂²d(µ;µ)/∂µ² >0 for all µ∈ U_j. The function V_j :U_j →R+ given by V_j(µ) = 2/{∂²d_j(µ, µ)/∂µ²}for allµin U_j is termed thevariance function (Cordeiro et al. 2021). The following families of distributions are examples of dispersion models:

Normal, Gamma, inverse Gaussian, von Mises, Poisson, and Binomial families. This setup defines a version of the multivariate GLMM described in Pelck & Labouriau (2021a) with the additional assumption that the multivariate distribution of the

random components follow an elliptical contoured distribution.

II.3 Representation of the Latent Covariance Structure via Graphical Models

In this section, we describe and illustrate how we can use the theory of graphical models to examine the latent covariance structure of the random components in the multivariate model described above, and how this covariance structure affects the correlation between the responses. First, we give a short account for the theory of graphical models. For a more comprehensively description see Lauritzen (1996) and Whittaker (1990).

II.3.1 Basic Theory of Graphical Models

Let G = (V,E) denote a graph defined with a set of vertices, V, composed of random variables and a set of edges, E ∈ V × V. The set of edges, E, consists of pairs of elements taken from V. We distinguish between undirected independence graphs (UGs) and directed acyclic independence graphs (DAGs) but the two types of graphs can be combined as we will see below. The two types of graphs differ because of the underlying assumption of symmetry in the roles played by the variables in an UG, whereas in a DAG one variable can carry information on another without the converse being necessarily true. In the DAG we use an arrow from one variable pointing to another variable to indicate that the first variable carries information on the second. In an UG, two vertices are connected by an edge if, and only if, they are not conditionally independent given the remaining variables in V. This is the same definition used for DAGs with the conditioning set modified from the remaining variables to a set containing all remaining variables that carry information on one of the two vertices either direct or through the other vertices in V.

No documento PhD Dissertation - Department of Mathematics (páginas 46-57)