Linear Hypothesis - Applied Multivariate Statistical Analysis∗

G- inverse

7.2 Linear Hypothesis

Summary

,→ The hypotheses H₀ : θ ∈ Ω₀ against H₁ : θ ∈ Ω₁ can be tested using the likelihood ratio test (LRT). The likelihood ratio (LR) is the quotient λ(X) =L^∗₀/L^∗₁ where the L^∗_j are the maxima of the likelihood for each of the hypotheses.

,→ The test statistic in the LRT is λ(X) or equivalently its logarithm logλ(X). If Ω₁ is q-dimensional and Ω₀ ⊂ Ω₁ r-dimensional, then the asymptotic distribution of −2 logλ is χ²_q−r. This allows H0 to be tested against H₁ by calculating the test statistic −2 logλ = 2(`^∗₁ −`^∗₀) where

`^∗_j = logL^∗_j.

,→ The hypothesis H₀ : µ = µ₀ for X ∼ N_p(µ,Σ), where Σ is known, leads to −2 logλ=n(x−µ₀)^>Σ⁻¹(x−µ₀)∼χ²_p.

,→ The hypothesis H₀ :µ=µ₀ for X ∼N_p(µ,Σ), where Σ is unknown, leads to −2 logλ=nlog{1 + (x−µ₀)^>S⁻¹(x−µ₀)} −→χ²_p, and

(n−1)(¯x−µ₀)^>S⁻¹(¯x−µ₀)∼T²(p, n−1).

,→ The hypothesisH₀ : Σ = Σ₀ forX ∼N_p(µ,Σ), whereµis unknown, leads to −2 logλ=ntr Σ⁻₀¹S

−nlog|Σ⁻₀¹S| −np−→χ²_m, m= ¹₂p(p+ 1).

,→ The hypothesis H₀ : β =β₀ for Y_i ∼N₁(β^>x_i, σ²), whereσ² is unknown, leads to −2 logλ=nlog

||y−Xβ0||²

||y−Xβˆ||²

−→χ²_p.

7.2 Linear Hypothesis 193 we can easily adapt the Test Problems 1 and 2 to this new situation. Indeed we know, from Theorem 5.2, that y_i =Ax_i ∼N_q(µ_y,Σ_y), where µ_y =Aµand Σ_y =AΣA^>.

Testing the nullH0 : Aµ=a, is the same as testingH0 : µy =a. The appropriate statistics are ¯y and Sy which can be derived from the original statistics ¯x and S available fromX:

y=Ax,¯ Sy =ASA^>.

Here the difference between the sample mean and the tested value is d = Ax¯−a. We are now in the situation to proceed to Test Problem 5 and 6.

TEST PROBLEM 5 Suppose X₁, . . . , X_n is an i.i.d. random sample from a N_p(µ,Σ) pop-ulation.

H₀ :Aµ=a, Σ known versus H₁ : no constraints.

By (7.2) we have that, underH₀:

n(Ax¯−a)^>(AΣA^>)⁻¹(Ax¯−a)∼ Xq²,

and we reject H₀ if this test statistic is too large at the desired significance level.

EXAMPLE 7.8 We consider hypotheses on partitioned mean vectors µ= µ1

µ2

. Let us first look at

H₀ :µ₁ =µ₂, versus H₁ :no constraints, for N_2p( ^µ_µ¹

, ^Σ₀_Σ⁰

) with known Σ. This is equivalent to A = (I,−I), a= (0, . . . ,0)^> ∈R^p and leads to:

−2 logλ=n(x₁−x₂)(2Σ)⁻¹(x₁−x₂)∼χ²_p. Another example is the test whether µ₁ = 0, i.e.,

H₀ :µ₁ = 0, versus H₁ :no constraints, for N_2p( ^µ_µ¹

, ^Σ₀_Σ⁰

) with known Σ. This is equivalent to Aµ = a with A = (I,0), and a= (0, . . . ,0)^> ∈R^p. Hence:

−2 logλ=nx1Σ⁻¹x1 ∼χ²_p.

TEST PROBLEM 6 Suppose X1, . . . , Xn is an i.i.d. random sample from a Np(µ,Σ) pop-ulation.

H₀ :Aµ=a, Σ unknown versusH₁ : no constraints.

From Corollary (5.4) and under H₀ it follows immediately that

(n−1)(Ax−a)^>(ASA^>)⁻¹(Ax−a)∼ T²(q, n−1) (7.9) since indeed underH0,

Ax∼N_q(a, n⁻¹AΣA^>) is independent of

nASA^> ∼W_q(AΣA^>, n−1).

EXAMPLE 7.9 Let’s come back again to the bank data set and suppose that we want to test if µ₄ =µ₅, i.e., the hypothesis that the lower border mean equals the larger border mean for the forged bills. In this case:

A = (0 0 0 1−1 0) a = 0.

The test statistic is:

99(Ax)¯ ^>(AS_fA^>)⁻¹(Ax)¯ ∼T²(1,99) =F_1,99. The observed value is 13.638 which is significant at the 5% level.

Repeated Measurements

In many situations, n independent sampling units are observed at p different times or un-der p different experimental conditions (different treatments,...). So here we repeat p one-dimensional measurements onn different subjects. For instance, we observe the results from nstudents taking pdifferent exams. We end up with a (n×p) matrix. We can thus consider the situation where we haveX₁, . . . , X_ni.i.d. from a normal distributionN_p(µ,Σ) when there are p repeated measurements. The hypothesis of interest in this case is that there are no treatment effects, H₀ : µ₁ = µ₂ = . . . =µ_p. This hypothesis is a direct application of Test Problem6. Indeed, introducing an appropriate matrix transform on µ we have:

H₀ : Cµ= 0 where C((p−1)×p) =







1 −1 0 · · · 0 0 1 −1 · · · 0 ... ... ... ... ... 0 · · · 0 1 −1







. (7.10)

Note that in many cases one of the experimental conditions is the “control” (a placebo, standard drug or reference condition). Suppose it is the first component. In that case one

7.2 Linear Hypothesis 195 is interested in studying differences to the control variable. The matrix C has therefore a different form

C((p−1)×p) =







1 −1 0 · · · 0 1 0 −1 · · · 0 ... ... ... ... ... 1 0 0 · · · −1





 .

By (7.9) the null hypothesis will be rejected if:

(n−p+ 1)

p−1 x¯^>C^>(CSC^>)⁻¹Cx > F¯ 1−α;p−1,n−p+1. As a matter of fact,Cµis the mean of the random variable y_i =Cx_i

yi ∼Np−1(Cµ,CΣC^>).

Simultaneous confidence intervals for linear combinations of the mean ofy_i have been derived above in (7.7). For all a∈R^p⁻¹, with probability (1−α) we have:

a^>Cµ∈a^>Cx¯± s

(p−1)

n−p+ 1F1−α;p−1,n−p+1a^>CSC^>a.

Due to the nature of the problem here, the row sums of the elements inC are zero: C1_p = 0, therefore a^>C is a vector whose sum of elements vanishes. This is called a contrast. Let b =C^>a. We have b^>1_p =

j=1

b_j = 0. The result above thus provides for all contrasts of µ, and b^>µsimultaneous confidence intervals at level (1−α)

b^>µ∈b^>x¯± s

(p−1)

n−p+ 1F1−α;p−1,n−p+1b^>Sb.

Examples of contrasts forp= 4 areb^> = (1 −1 0 0) or (1 0 0 −1) or even (1 −¹₃ −¹₃ −¹₃) when the control is to be compared with the mean of 3 different treatments.

EXAMPLE 7.10 Bock (1975) considers the evolution of the vocabulary of children from the eighth through eleventh grade. The data set contains the scores of a vocabulary test of 40 randomly chosen children that are observed from grades 8 to 11. This is a repeated measurement situation, (n = 40, p= 4), since the same children were observed from grades 8 to 11. The statistics of interest are:

x = (1.086,2.544,2.851,3.420)^>

S =







2.902 2.438 2.963 2.183 2.438 3.049 2.775 2.319 2.963 2.775 4.281 2.939 2.183 2.319 2.939 3.162





 .

Suppose we are interested in the yearly evolution of the children. Then the matrixC providing successive differences of µ_j is:

C =





1 −1 0 0

0 1 −1 0

0 0 1 −1



.

The value of the test statistic isF_obs= 53.134 which is highly significant forF_3.37. There are significant differences between the successive means. However, the analysis of the contrasts shows the following simultaneous 95% confidence intervals

−1.958 ≤ µ₁−µ₂ ≤ −0.959

−0.949 ≤ µ2−µ3 ≤ 0.335

−1.171 ≤ µ₃−µ₄ ≤ 0.036.

Thus, the rejection ofH₀ is mainly due to the difference between the childrens’ performances in the first and second year. The confidence intervals for the following contrasts may also be of interest:

−2.283 ≤ µ₁− ¹₃(µ₂+µ₃+µ₄) ≤ −1.423

−1.777 ≤ ¹₃(µ₁+µ₂+µ₃)−µ₄ ≤ −0.742

−1.479 ≤ µ₂−µ₄ ≤ −0.272.

They show thatµ₁ is different from the average of the 3 other years (the same being true for µ4) and µ4 turns out to be higher than µ2 (and of course higher than µ1).

Test Problem 7 illustrates how the likelihood ratio can be applied when testing a linear restriction on the coefficient β of a linear model. It is also shown how a transformation of the test statistic leads to an exact F test as presented in Chapter 3.

TEST PROBLEM 7 Suppose Y₁, . . . , Y_n, are independent with Y_i ∼ N₁(β^>x_i, σ²), and xi ∈ R^p.

H₀ :Aβ =a, σ² unknown versus H₁ : no constraints.

The constrained maximum likelihood estimators underH0 are (Exercise 3.24):

β˜= ˆβ−(X^>X)⁻¹A^>{A(X^>X)⁻¹A^>}⁻¹(Aβˆ−a)

for β and ˜σ² = _n¹(y− Xβ)˜ ^>(y− Xβ). The estimate ˆ˜ β denotes the unconstrained MLE as before. Hence, the LR statistic is

−2 logλ = 2(`^∗₁−`^∗₀)

= nlog ||y− Xβ˜||²

||y− Xβˆ||²

−→L χ²_q

7.2 Linear Hypothesis 197 where q is the number of elements ofa. This problem also has an exact F-test since

n−p q

||y− Xβ˜||²

||y− Xβˆ||² −1

= n−p q

(Aβˆ−a)^>{A(X^>X)⁻¹A^>}⁻¹(Aβˆ−a)

(y− Xβ)ˆ ^>(y− Xβ)ˆ ∼F_q,n−p. EXAMPLE 7.11 Let us continue with the “classic blue” pullovers. We can once more test if β = 0 in the regression of sales on prices. It holds that

β = 0 iff (0 1) α

= 0.

The LR statistic here is

−2 logλ = 0.284

which is not significant for the χ²₁ distribution. The F-test statistic F = 0.231

is also not significant. Hence, we can assume independence of sales and prices (alone). Recall that this conclusion has to be revised if we consider the prices together with advertisement costs and hours of sales managers.

Recall the different conclusion that was made in Example7.6 when we rejectedH₀ :α= 211 and β = 0. The rejection there came from the fact that the pair of values was rejected.

Indeed, if β = 0 the estimator ofα would be ¯y= 172.70 and this is too far from 211.

EXAMPLE 7.12 Let us now consider the multivariate regression in the “classic blue” pullovers example. From Example 3.15 we know that the estimated parameters in the model

X1 =α+β1X2+β2X3+β3X4+ε are

α= 65.670, βˆ₁ =−0.216, βˆ₂ = 0.485, βˆ₃ = 0.844.

Hence, we could postulate the approximate relation:

β₁ ≈ −1 2β₂,

which means in practice that augmenting the price by 20 EUR requires the advertisement costs to increase by 10 EUR in order to keep the number of pullovers sold constant. Vice versa, reducing the price by 20 EUR yields the same result as before if we reduced the advertisement costs by 10 EUR. Let us now test whether the hypothesis

H₀ : β₁ =−1 2β₂

No documento Applied Multivariate Statistical Analysis∗ (páginas 194-200)