• Nenhum resultado encontrado

Multivariate Linear Regression

N/A
N/A
Protected

Academic year: 2022

Share "Multivariate Linear Regression"

Copied!
84
0
0

Texto

(1)

Multivariate Linear Regression

Nathaniel E. Helwig

Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities)

Updated 16-Jan-2017

(2)

Copyright

Copyright c2017 by Nathaniel E. Helwig

(3)

Outline of Notes

1) Multiple Linear Regression Model form and assumptions Parameter estimation

Inference and prediction

2) Multivariate Linear Regression Model form and assumptions Parameter estimation

Inference and prediction

(4)

Multiple Linear Regression

(5)

MLR Model: Scalar Form

Themultiple linear regressionmodel has the form yi=b0+

p

X

j=1

bjxij +ei

fori∈ {1, . . . ,n}where

yi ∈Ris the real-valuedresponsefor thei-th observation b0∈Ris the regressionintercept

bj ∈Ris thej-th predictor’s regressionslope xij ∈Ris thej-thpredictorfor thei-th observation ei iid∼N(0, σ2)is a Gaussianerror term

(6)

MLR Model: Nomenclature

The model ismultiplebecause we havep>1 predictors.

Ifp=1, we have asimplelinear regression model

The model islinearbecauseyi is a linear function of the parameters (b0,b1, . . . ,bpare the parameters).

The model is aregressionmodel because we are modeling a response variable (Y) as a function of predictor variables (X1, . . . ,Xp).

(7)

MLR Model: Assumptions

The fundamental assumptions of the MLR model are:

1 Relationship betweenXj andY islinear(given other predictors)

2 xij andyi areobserved random variables(known constants)

3 ei iid∼N(0, σ2)is anunobserved random variable

4 b0,b1, . . . ,bpareunknown constants

5 (yi|xi1, . . . ,xip)ind∼ N(b0+Pp

j=1bjxij, σ2) note: homogeneity of variance

Note: bj is expected increase inY for 1-unit increase inXj with all other predictor variables held constant

(8)

MLR Model: Matrix Form

The multiple linear regression model has the form y=Xb+e

where

y= (y1, . . . ,yn)0 ∈Rn is then×1response vector

X= [1n,x1, . . . ,xp]∈Rn×(p+1)is then×(p+1)design matrix

1nis ann×1 vector of ones

xj = (x1j, . . . ,xnj)0 Rnisj-th predictor vector (n×1)

b= (b0,b1, . . . ,bp)0 ∈Rp+1is(p+1)×1vector of coefficients e= (e1, . . . ,en)0 ∈Rnis then×1error vector

(9)

MLR Model: Matrix Form (another look)

Matrix form writes MLR model for allnpoints simultaneously y=Xb+e

 y1 y2 y3 ... yn

=

1 x11 x12 · · · x1p 1 x21 x22 · · · x2p 1 x31 x32 · · · x3p ... ... ... . .. ... 1 xn1 xn2 · · · xnp

 b0 b1 b2 ... bp

 +

 e1 e2 e3 ... en

(10)

MLR Model: Assumptions (revisited)

In matrix terms, the error vector is multivariate normal:

e∼N(0n, σ2In)

In matrix terms, the response vector is multivariate normal givenX:

(y|X)∼N(Xb, σ2In)

(11)

Ordinary Least Squares

Theordinary least squares(OLS) problem is min

b∈Rp+1

ky−Xbk2= min

b∈Rp+1 n

X

i=1

yi−b0−Pp j=1bjxij

2

wherek · kdenotes the Frobenius norm.

The OLS solution has the form

bˆ = (X0X)−1X0y

(12)

Fitted Values and Residuals

SCALAR FORM:

Fitted valuesare given by yˆi = ˆb0+Pp

j=1jxij andresidualsare given by

i =yi−yˆi

MATRIX FORM:

Fitted valuesare given by ˆy=Xbˆ andresidualsare given by

eˆ=yyˆ

(13)

Hat Matrix

Note that we can write the fitted values as yˆ=Xbˆ

=X(X0X)−1X0y

=Hy whereH=X(X0X)−1X0 is thehat matrix.

His a symmetric and idempotent matrix: HH=H

Hprojectsyonto the column space ofX.

(14)

Multiple Regression Example in R

> data(mtcars)

> head(mtcars)

mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

> mtcars$cyl <- factor(mtcars$cyl)

> mod <- lm(mpg ~ cyl + am + carb, data=mtcars)

> coef(mod)

(Intercept) cyl6 cyl8 am carb

25.320303 -3.549419 -6.904637 4.226774 -1.119855

(15)

Regression Sums-of-Squares: Scalar Form

In MLR models, the relevant sums-of-squares are Sum-of-Squares Total: SST =Pn

i=1(yi−y¯)2 Sum-of-Squares Regression: SSR=Pn

i=1(ˆyi−y¯)2 Sum-of-Squares Error: SSE=Pn

i=1(yi−yˆi)2

The correspondingdegrees of freedomare SST: dfT =n−1

SSR: dfR =p

SSE: dfE =n−p−1

(16)

Regression Sums-of-Squares: Matrix Form

In MLR models, the relevant sums-of-squares are SST =

n

X

i=1

(yi −y¯)2

=y0[In−(1/n)J]y SSR=

n

X

i=1

(ˆyi −y¯)2

=y0[H−(1/n)J]y SSE =

n

X

i=1

(yi −yˆi)2

=y0[InH]y

(17)

Partitioning the Variance

We can partition the total variation inyi as SST =

n

X

i=1

(yiy¯)2

=

n

X

i=1

(yiyˆi+ ˆyi y¯)2

=

n

X

i=1

yiy¯)2+

n

X

i=1

(yiyˆi)2+2

n

X

i=1

yiy¯)(yiyˆi)

=SSR+SSE+2

n

X

i=1

yiy¯ei

=SSR+SSE

(18)

Regression Sums-of-Squares in R

> anova(mod)

Analysis of Variance Table Response: mpg

Df Sum Sq Mean Sq F value Pr(>F) cyl 2 824.78 412.39 52.4138 5.05e-10 ***

am 1 36.77 36.77 4.6730 0.03967 * carb 1 52.06 52.06 6.6166 0.01592 * Residuals 27 212.44 7.87

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> Anova(mod, type=3) Anova Table (Type III tests) Response: mpg

Sum Sq Df F value Pr(>F) (Intercept) 3368.1 1 428.0789 < 2.2e-16 ***

cyl 121.2 2 7.7048 0.002252 **

am 77.1 1 9.8039 0.004156 **

carb 52.1 1 6.6166 0.015923 *

Residuals 212.4 27 ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(19)

Coefficient of Multiple Determination

Thecoefficient of multiple determinationis defined as R2= SSR

SST

=1−SSE SST

and gives the amount of variation inyi that is explained by the linear relationships withxi1, . . . ,xip.

When interpretingR2values, note that. . . 0≤R2≤1

LargeR2values do not necessarily imply a good model

(20)

Adjusted Coefficient of Multiple Determination (R

a2

)

Including more predictors in a MLR model can artificially inflateR2: Capitalizing on spurious effects present in noisy data

Phenomenon ofover-fittingthe data

TheadjustedR2is a relative measure of fit:

Ra2=1− SSE/dfE SST/dfT

=1− σˆ2 s2Y wheres2Y =

Pn

i=1(yi−¯y)2

n−1 is the sample estimate of the variance ofY.

(21)

Regression Sums-of-Squares in R

> smod <- summary(mod)

> names(smod)

[1] "call" "terms" "residuals" "coefficients"

[5] "aliased" "sigma" "df" "r.squared"

[9] "adj.r.squared" "fstatistic" "cov.unscaled"

> summary(mod)$r.squared [1] 0.8113434

> summary(mod)$adj.r.squared [1] 0.7833943

(22)

Relation to ML Solution

Remember that(y|X)∼N(Xb, σ2In), which implies thatyhas pdf f(y|X,b, σ2) = (2π)−n/22)−n/2e

1

2(y−Xb)0(y−Xb)

As a result, thelog-likelihoodofbgiven(y,X, σ2)is ln{L(b|y,X, σ2)}=− 1

2(y−Xb)0(y−Xb) +c wherec is a constant that does not depend onb.

(23)

Relation to ML Solution (continued)

Themaximum likelihood estimate(MLE) ofbis the estimate satisfying max

b∈Rp+1

− 1

2(y−Xb)0(y−Xb)

Now, note that. . . maxb∈

Rp+112(y−Xb)0(y−Xb) =maxb∈

Rp+1−(y−Xb)0(y−Xb) maxb∈Rp+1−(y−Xb)0(y−Xb) =minb∈Rp+1(y−Xb)0(y−Xb)

Thus, the OLS and ML estimate ofbis the same: bˆ= (X0X)−1X0y

(24)

Estimated Error Variance (Mean Squared Error)

The estimated error variance is ˆ

σ2=SSE/(n−p−1)

=

n

X

i=1

(yi−yˆi)2/(n−p−1)

=k(InH)yk2/(n−p−1) which is an unbiased estimate of error varianceσ2.

The estimateσˆ2is themean squared error(MSE) of the model.

(25)

Maximum Likelihood Estimate of Error Variance

˜

σ2=Pn

i=1(yi−yˆi)2/nis the MLE ofσ2.

From our previous results usingσˆ2, we have that E(˜σ2) = n−p−1

n σ2

Consequently, thebiasof the estimatorσ˜2is given by n−p−1

n σ2−σ2=−(p+1) n σ2 and note that−(p+1)n σ2→0 asn→ ∞.

(26)

Comparing σ ˆ

2

and σ ˜

2

Reminder: the MSE and MLE ofσ2are given by ˆ

σ2=k(InH)yk2/(n−p−1)

˜

σ2=k(InH)yk2/n

From the definitions ofσˆ2andσ˜2we have that

˜ σ2<σˆ2

so the MLE produces a smaller estimate of the error variance.

(27)

Estimated Error Variance in R

# get mean-squared error in 3 ways

> n <- length(mtcars$mpg)

> p <- length(coef(mod)) - 1

> smod$sigma^2 [1] 7.868009

> sum((mod$residuals)^2) / (n - p - 1) [1] 7.868009

> sum((mtcars$mpg - mod$fitted.values)^2) / (n - p - 1) [1] 7.868009

# get MLE of error variance

> smod$sigma^2 * (n - p - 1) / n [1] 6.638633

(28)

Summary of Results

Given the model assumptions, we have bˆ∼N(b, σ2(X0X)−1)

ˆy∼N(Xb, σ2H)

ˆe∼N(0, σ2(InH))

Typicallyσ2is unknown, so we use the MSEˆσ2in practice.

(29)

ANOVA Table and Regression F Test

We typically organize the SS information into anANOVA table:

Source SS df MS F p-value

SSR Pn

i=1(ˆyi−y¯)2 p MSR F p

SSE Pn

i=1(yi−yˆi)2 n−p−1 MSE

SST Pn

i=1(yi−y¯)2 n−1

MSR = SSRp ,MSE = n−p−1SSE ,F = MSRMSE ∼Fp,n−p−1, p =P(Fp,n−p−1>F)

F-statistic andp-value are testingH0:b1=· · ·=bp =0 versus H1:bk 6=0 for somek ∈ {1, . . . ,p}

(30)

Inferences about b ˆ

j

with σ

2

Known

Ifσ2is known, form 100(1−α)% CIs using

0±Zα/2σb0j±Zα/2σbj where

Zα/2is normal quantile such thatP(X >Zα/2) =α/2

σb0 andσbj are square-roots of diagonals ofV(ˆb) =σ2(X0X)−1

To testH0:bj =bjvs.H1:bj6=bj (for somej ∈ {0,1, . . . ,p}) use Z = (ˆbj−bj)/σbj

which follows a standard normal distribution underH0.

(31)

Inferences about b ˆ

j

with σ

2

Unknown

Ifσ2is unknown, form 100(1−α)% CIs using

ˆb0±tn−p−1(α/2) ˆσb0j±tn−p−1(α/2) ˆσbj where

tn−p−1(α/2) istn−p−1quantile withP(X >tn−p−1(α/2) ) =α/2 ˆ

σb0 andσˆbj are square-roots of diagonals ofV(ˆˆ b) = ˆσ2(X0X)−1

To testH0:bj =bjvs.H1:bj6=bj (for somej ∈ {0,1, . . . ,p}) use T = (ˆbj−bj)/ˆσbj

which follows atn−p−1distribution underH0.

(32)

Coefficient Inference in R

> summary(mod) Call:

lm(formula = mpg ~ cyl + am + carb, data = mtcars) Residuals:

Min 1Q Median 3Q Max

-5.9074 -1.1723 0.2538 1.4851 5.4728 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 25.3203 1.2238 20.690 < 2e-16 ***

cyl6 -3.5494 1.7296 -2.052 0.049959 * cyl8 -6.9046 1.8078 -3.819 0.000712 ***

am 4.2268 1.3499 3.131 0.004156 **

carb -1.1199 0.4354 -2.572 0.015923 * ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.805 on 27 degrees of freedom Multiple R-squared: 0.8113, Adjusted R-squared: 0.7834 F-statistic: 29.03 on 4 and 27 DF, p-value: 1.991e-09

> confint(mod)

2.5 % 97.5 %

(Intercept) 22.809293 27.8313132711 cyl6 -7.098164 -0.0006745487 cyl8 -10.613981 -3.1952927942

(33)

Inferences about Multiple b ˆ

j

Assume thatq<pand want to test if a reduced model is sufficient:

H0:bq+1=bq+2=· · ·=bp=b H1:at least onebk 6=b

Compare the SSE for full and reduced (constrained) models:

(a) Full Model: yi =b0+Pp

j=1bjxij+ei (b) Reduced Model: yi =b0+Pq

j=1bjxij +bPp

k=q+1xik+ei

Note: setb =0 to removeXq+1, . . . ,Xpfrom model.

(34)

Inferences about Multiple b ˆ

j

(continued)

Test Statistic:

F = SSER−SSEF

dfR−dfF ÷SSEF dfF

= SSER−SSEF

(n−q−1)−(n−p−1) ÷ SSEF n−p−1

∼F(p−q,n−p−1)

where

SSERis sum-of-squares error for reduced model SSEF is sum-of-squares error for full model dfRis error degrees of freedom for reduced model dfF is error degrees of freedom for full model

(35)

Inferences about Linear Combinations of b ˆ

j

Assume thatc= (c1, . . . ,cp+1)0 and want to test:

H0:c0b=b H1:c0b6=b

Test statistic:

t= c0bˆ−b ˆ

σp

c0(X0X)−1c

∼tn−p−1

(36)

Confidence Interval for σ

2

Note that (n−p−1)ˆσ2 σ2 = SSEσ2 =

Pn i=1ˆe2i

σ2 ∼χ2n−p−1

This implies that

χ2(n−p−1;1−α/2)< (n−p−1)ˆσ2

σ2 < χ2(n−p−1;α/2)

whereP(Q> χ2(n−p−1;α/2)) =α/2, so a 100(1−α)% CI is given by (n−p−1)ˆσ2

χ2(n−p−1;α/2)

< σ2< (n−p−1)ˆσ2 χ2(n−p−1;1−α/2)

(37)

Interval Estimation

Idea: estimateexpected value of responsefor a given predictor score.

Givenxh= (1,xh1, . . . ,xhp), the fitted value isyˆh =xhˆb.

Variance ofyˆhis given byσ2¯y

h =V(xhb) =ˆ xhV(ˆb)x0h2xh(X0X)−1x0h Useσˆ¯y2

h = ˆσ2xh(X0X)−1x0hifσ2is unknown We can testH0:E(yh) =yhvs.H1:E(yh)6=yh

Test statistic:T = (ˆyh−yh)/ˆσy¯h, which followstn−p−1distribution 100(1−α)% CI forE(yh): yˆh±tn−p−1(α/2) ˆσ¯yh

(38)

Predicting New Observations

Idea: estimateobserved value of responsefor a given predictor score.

Note: interested in actualyˆhvalue instead ofE(ˆyh)

Givenxh= (1,xh1, . . . ,xhp), the fitted value isyˆh =xhˆb.

Note: same as interval estimation

When predicting a new observation, there are two uncertainties:

location of the distribution ofY forX1, . . . ,Xp(captured byσ2¯y

h) variability within the distribution ofY (captured byσ2)

(39)

Predicting New Observations (continued)

Two sources of variance are independent soσy2h2¯y

h2 Useσˆy2

h = ˆσ2¯y

h+ ˆσ2ifσ2is unknown

We can testH0:yh=yh vs.H1:yh6=yh

Test statistic:T = (ˆyh−yh)/ˆσyh, which followstn−p−1distribution 100(1−α)%Prediction Interval (PI)foryh: yˆh±tn−p−1(α/2) σˆyh

(40)

Confidence and Prediction Intervals in R

# confidence interval

> newdata <- data.frame(cyl=factor(6, levels=c(4,6,8)), am=1, carb=4)

> predict(mod, newdata, interval="confidence")

fit lwr upr

1 21.51824 18.92554 24.11094

# prediction interval

> newdata <- data.frame(cyl=factor(6, levels=c(4,6,8)), am=1, carb=4)

> predict(mod, newdata, interval="prediction")

fit lwr upr

1 21.51824 15.20583 27.83065

(41)

Simultaneous Confidence Regions

Given the distribution ofbˆ (and some probability theory), we have that (ˆbb)0X0X(ˆbb)

σ2 ∼χ2p+1 and (n−p−1)ˆσ2

σ2 ∼χ2n−p−1 which implies that

bb)0X0X(ˆbb)

(p+1)ˆσ2 ∼ χ2p+1/(p+1)

χ2n−p−1/(n−p−1) ≡F(p+1,n−p−1)

To form a 100(1−α)% confidence region (CR) use limits such that (ˆbb)0X0X(ˆbb)≤(p+1)ˆσ2F(p+1,n−p−1)(α)

whereF(p+1,n−p−1)(α) is the critical value for significance levelα.

(42)

Simultaneous Confidence Regions in R

−10 −6 −4 −2 0 2

−14−10−6−20

α = 0.1

cyl6 coefficient

cyl8 coefficient

−10 −6 −4 −2 0 2

−14−10−6−20

α = 0.01

cyl6 coefficient

cyl8 coefficient

dev.new(height=4,width=8,noRStudioGD=TRUE) par(mfrow=c(1,2))

confidenceEllipse(mod,c(2,3),levels=.9,xlim=c(-11,3),ylim=c(-14,0), main=expression(alpha*" = "*.1),cex.main=2)

(43)

Multivariate Linear

Regression

(44)

MvLR Model: Scalar Form

Themultivariate (multiple) linear regressionmodel has the form yik =b0k+

p

X

j=1

bjkxij +eik

fori∈ {1, . . . ,n}andk ∈ {1, . . . ,m}where

yik ∈Ris thek-th real-valuedresponsefor thei-th observation b0k ∈Ris the regressioninterceptfork-th response

bjk ∈Ris thej-th predictor’s regressionslopefork-th response xij ∈Ris thej-thpredictorfor thei-th observation

(ei1, . . . ,eim)iid∼N(0m,Σ)is a multivariate Gaussianerror vector

(45)

MvLR Model: Nomenclature

The model ismultivariatebecause we havem>1 response variables.

The model ismultiplebecause we havep>1 predictors.

Ifp=1, we have a multivariatesimplelinear regression model

The model islinearbecauseyik is a linear function of the parameters (bjk are the parameters forj∈ {1, . . . ,p+1}andk ∈ {1, . . . ,m}).

The model is aregressionmodel because we are modeling response variables (Y1, . . . ,Ym) as a function of predictor variables (X1, . . . ,Xp).

(46)

MvLR Model: Assumptions

The fundamental assumptions of the MLR model are:

1 Relationship betweenXj andYk islinear(given other predictors)

2 xij andyik areobserved random variables(known constants)

3 (ei1, . . . ,eim)iid∼N(0m,Σ)is anunobserved random vector

4 bk = (b0k,b1k, . . . ,bpk)0 fork ∈ {1, . . . ,m}areunknown constants

5 (yik|xi1, . . . ,xip)∼N(b0k+Pp

j=1bjkxij, σkk)for eachk ∈ {1, . . . ,m}

note: homogeneity of variancefor each response

Note: bjk is expected increase inYk for 1-unit increase inXj with all other predictor variables held constant

(47)

MvLR Model: Matrix Form

The multivariate multiple linear regression model has the form Y=XB+E

where

Y= [y1, . . . ,ym]∈Rn×mis then×mresponse matrix

yk = (y1k, . . . ,ynk)0Rnisk-th response vector (n×1)

X= [1n,x1, . . . ,xp]∈Rn×(p+1)is then×(p+1)design matrix

1nis ann×1 vector of ones

xj = (x1j, . . . ,xnj)0 Rnisj-th predictor vector (n×1)

B= [b1, . . . ,bm]∈R(p+1)×mis(p+1)×mmatrix of coefficients

bk = (b0k,b1k, . . . ,bpk)0Rp+1isk-th coefficient vector (p+1×1) E= [e1, . . . ,em]∈Rn×mis then×merror matrix

ek = (e1k, . . . ,enk)0 Rnisk-th error vector (n×1)

(48)

MvLR Model: Matrix Form (another look)

Matrix form writes MLR model for allnmpoints simultaneously

Y=XB+E

y11 · · · y1m y21 · · · y2m

y31 · · · y3m

..

. . .. ... yn1 · · · ynm

=

1 x11 x12 · · · x1p 1 x21 x22 · · · x2p

1 x31 x32 · · · x3p

.. .

.. .

..

. . .. ... 1 xn1 xn2 · · · xnp

b01 · · · b0m b11 · · · b1m

b21 · · · b2m

..

. . .. ... bp1 · · · bpm

+

e11 · · · e1m e21 · · · e2m

e31 · · · e3m

..

. . .. ... en1 · · · enm

(49)

MvLR Model: Assumptions (revisited)

Assuming that thensubjects are independent, we have that ek ∼N(0n, σkkIn)whereek isk-th column ofE

ei iid∼N(0m,Σ)whereei isi-th row ofE

vec(E)∼N(0nm,Σ⊗In)where⊗denotes the Kronecker product vec(E0)∼N(0nm,In⊗Σ)where⊗denotes the Kronecker product

The response matrix is multivariate normal givenX (vec(Y)|X)∼N([B0In]vec(X),Σ⊗In) (vec(Y0)|X)∼N([InB0]vec(X0),In⊗Σ)

where[B0In]vec(X) =vec(XB)and[InB0]vec(X0) =vec(B0X0).

(50)

MvLR Model: Mean and Covariance

Note that the assumed mean vector for vec(Y0)is

[InB0]vec(X0) =vec(B0X0) =

B0x1

... B0xn

wherexi is thei-th row ofX

The assumed covariance matrix for vec(Y0)is block diagonal

In⊗Σ=

Σ 0m×m · · · 0m×m 0m×m Σ · · · 0m×m

... ... . .. ... 0 0 · · · Σ

(51)

Ordinary Least Squares

Theordinary least squares(OLS) problem is min

B∈R(p+1)×m

kY−XBk2= min

B∈R(p+1)×m n

X

i=1 m

X

k=1

yik−b0k −Pp

j=1bjkxij2

wherek · kdenotes the Frobenius norm.

OLS(B) =kY−XBk2=tr(Y0Y)−2tr(Y0XB) +tr(B0X0XB)

OLS(B)

∂B =−2X0Y+2X0XB The OLS solution has the form

Bˆ = (X0X)−1X0Y ⇐⇒ bˆk = (X0X)−1X0yk

wherebk andyk denote thek-th columns ofBandY, respectively.

(52)

Fitted Values and Residuals

SCALAR FORM:

Fitted valuesare given by yˆik = ˆb0k+Pp

j=1ˆbjkxij andresidualsare given by

ik =yik−yˆik

MATRIX FORM:

Fitted valuesare given by Yˆ =XBˆ andresidualsare given by

Eˆ =YYˆ

(53)

Hat Matrix

Note that we can write the fitted values as Yˆ =XBˆ

=X(X0X)−1X0Y

=HY whereH=X(X0X)−1X0 is thehat matrix.

His a symmetric and idempotent matrix: HH=H

Hprojectsyk onto the column space ofXfork ∈ {1, . . . ,m}.

(54)

Multivariate Regression Example in R

> data(mtcars)

> head(mtcars)

mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

> mtcars$cyl <- factor(mtcars$cyl)

> Y <- as.matrix(mtcars[,c("mpg","disp","hp","wt")])

> mvmod <- lm(Y ~ cyl + am + carb, data=mtcars)

> coef(mvmod)

mpg disp hp wt

(Intercept) 25.320303 134.32487 46.5201421 2.7612069 cyl6 -3.549419 61.84324 0.9116288 0.1957229 cyl8 -6.904637 218.99063 87.5910956 0.7723077 am 4.226774 -43.80256 4.4472569 -1.0254749 carb -1.119855 1.72629 21.2764930 0.1749132

(55)

Sums-of-Squares and Crossproducts: Vector Form

In MvLR models, the relevant sums-of-squares and crossproducts are Total: SSCPT =Pn

i=1(yi−¯y)(yi−¯y)0 Regression: SSCPR =Pn

i=1yiy)(ˆ¯ yiy)¯ 0 Error: SSCPE =Pn

i=1(yiyˆi)(yi−ˆyi)0

whereyi andyˆi denote thei-th rows ofYandYˆ =XB, respectively.ˆ

The correspondingdegrees of freedomare SSCPT: dfT =m(n−1)

SSCPR: dfR =mp

SSCPE: dfE =m(n−p−1)

(56)

Sums-of-Squares and Crossproducts: Matrix Form

In MvLR models, the relevant sums-of-squares are SSCPT =

n

X

i=1

(yiy)(y¯ iy)¯ 0

=Y0[In−(1/n)J]Y SSCPR=

n

X

i=1

yiy)(ˆ¯ yiy)¯ 0

=Y0[H−(1/n)J]Y SSCPE =

n

X

i=1

(yiyˆi)(yi −ˆyi)0

=Y0[InH]Y

(57)

Partitioning the SSCP Total Matrix

We can partition the total covariation inyi as SSCPT =

n

X

i=1

(yi¯y)(yi y)¯ 0

=

n

X

i=1

(yiˆyi + ˆyi¯y)(yi ˆyi+ ˆyi ¯y)0

=

n

X

i=1

yi¯y)(ˆyi y)¯ 0+

n

X

i=1

(yiˆyi)(yiyˆi)0+2

n

X

i=1

yi ¯y)(yiyˆi)0

=SSCPR+SSCPE+2

n

X

i=1

yiy)ˆ¯ e0i

=SSCPR+SSCPE

(58)

Multivariate Regression SSCP in R

> ybar <- colMeans(Y)

> n <- nrow(Y)

> m <- ncol(Y)

> Ybar <- matrix(ybar, n, m, byrow=TRUE)

> SSCP.T <- crossprod(Y - Ybar)

> SSCP.R <- crossprod(mvmod$fitted.values - Ybar)

> SSCP.E <- crossprod(Y - mvmod$fitted.values)

> SSCP.T

mpg disp hp wt

mpg 1126.0472 -19626.01 -9942.694 -158.61723 disp -19626.0134 476184.79 208355.919 3338.21032 hp -9942.6938 208355.92 145726.875 1369.97250 wt -158.6172 3338.21 1369.972 29.67875

> SSCP.R + SSCP.E

mpg disp hp wt

mpg 1126.0472 -19626.01 -9942.694 -158.61723 disp -19626.0134 476184.79 208355.919 3338.21033 hp -9942.6938 208355.92 145726.875 1369.97250 wt -158.6172 3338.21 1369.973 29.67875

(59)

Relation to ML Solution

Remember that(yi|xi)∼N(B0xi,Σ), which implies thatyi has pdf f(yi|xi,B,Σ) = (2π)−m/2|Σ|−1/2exp{−(1/2)(yiB0xi)0Σ−1(yiB0xi)}

whereyi andxi denote thei-th rows ofYandX, respectively.

As a result, thelog-likelihoodofBgiven(Y,X,Σ)is ln{L(B|Y,X,Σ)}=−1

2

n

X

i=1

(yiB0xi)0Σ−1(yiB0xi) +c

wherec is a constant that does not depend onB.

(60)

Relation to ML Solution (continued)

Themaximum likelihood estimate(MLE) ofBis the estimate satisfying max

B∈R(p+1)×m

MLE(B) = max

B∈R(p+1)×m

−1 2

n

X

i=1

(yiB0xi)0Σ−1(yiB0xi)

and note that(yiB0xi)0Σ−1(yiB0xi) =tr{Σ−1(yiB0xi)(yiB0xi)0}

Taking the derivative with respect toBwe see that

∂MLE(B)

∂B =−2

n

X

i=1

xiy0iΣ−1+2

n

X

i=1

xix0i−1

=−2X0−1+2X0XBΣ−1

(61)

Estimated Error Covariance

The estimated error variance is Σˆ = SSCPE

n−p−1

= Pn

i=1(yi−ˆyi)(yiyˆi)0 n−p−1

= Y0(InH)Y n−p−1

which is an unbiased estimate of error covariance matrixΣ.

The estimateΣˆ is themean SSCP errorof the model.

(62)

Maximum Likelihood Estimate of Error Covariance

Σ˜ = 1nY0(InH)Yis the MLE ofΣ.

From our previous results usingΣ, we have thatˆ E( ˜Σ) = n−p−1

n Σ

Consequently, thebiasof the estimatorΣ˜ is given by n−p−1

n Σ−Σ=−(p+1) n Σ and note that−(p+1)Σ→0 asn→ ∞.

(63)

Comparing Σ ˆ and Σ ˜

Reminder: the MSSCPE and MLE ofΣare given by Σˆ =Y0(InH)Y/(n−p−1) Σ˜ =Y0(InH)Y/n

From the definitions ofΣˆ andΣ˜ we have that

˜

σkk <σˆkk for allk

whereσˆkk andσ˜kk denote thek-th diagonals ofΣˆ andΣ, respectively.˜ MLE produces smaller estimates of the error variances

(64)

Estimated Error Covariance Matrix in R

> n <- nrow(Y)

> p <- nrow(coef(mvmod)) - 1

> SSCP.E <- crossprod(Y - mvmod$fitted.values)

> SigmaHat <- SSCP.E / (n - p - 1)

> SigmaTilde <- SSCP.E / n

> SigmaHat

mpg disp hp wt

mpg 7.8680094 -53.27166 -19.7015979 -0.6575443 disp -53.2716607 2504.87095 425.1328988 18.1065416 hp -19.7015979 425.13290 577.2703337 0.4662491 wt -0.6575443 18.10654 0.4662491 0.2573503

> SigmaTilde

mpg disp hp wt

mpg 6.638633 -44.94796 -16.6232233 -0.5548030 disp -44.947964 2113.48487 358.7058833 15.2773945 hp -16.623223 358.70588 487.0718440 0.3933977 wt -0.554803 15.27739 0.3933977 0.2171394

(65)

Expected Value of Least Squares Coefficients

The expected value of the estimated coefficients is given by E( ˆB) =E[(X0X)−1X0Y]

= (X0X)−1X0E(Y)

= (X0X)−1X0XB

=B soBˆ is an unbiased estimator ofB.

(66)

Covariance Matrix of Least Squares Coefficients

The covariance matrix of the estimated coefficients is given by V{vec( ˆB0)}=V{vec(Y0X(X0X)−1)}

=V{[(X0X)−1X0Im]vec(Y0)}

= [(X0X)−1X0Im]V{vec(Y0)}[(X0X)−1X0Im]0

= [(X0X)−1X0Im][In⊗Σ][X(X0X)−1Im]

= [(X0X)−1X0Im][X(X0X)−1⊗Σ]

= (X0X)−1⊗Σ

Note: we could also writeV{vec( ˆB)}=Σ⊗(X0X)−1

(67)

Distribution of Coefficients

The estimated regression coefficients are a linear function ofYso we know thatBˆ follows a multivariate normal distribution.

vec( ˆB)∼N[vec(B),Σ⊗(X0X)−1] vec( ˆB0)∼N[vec(B0),(X0X)−1⊗Σ]

The covariance between two columns ofBˆ has the form Cov(ˆbk,bˆ`) =σk`(X0X)−1

and the covariance between two rows ofBˆ has the form Cov(ˆbgbj) = (X0X)−1gj Σ

where(X0X)−1gj denotes the(g,j)-th element of(X0X)−1.

Referências

Documentos relacionados

Prediction of sand content showed better results when the original depth data was used as an input, although all regression tree models had R 2 values greater than 0.52.. The RT

In order for Vodafone Retail to become a desired place to work at, the company needs to develop its employer branding practices with the ultimate goal of engaging their employees

Multiple linear regression and random forest to predict and map soil properties using data from portable X -ray fluorescence spectrometer (pXRF).. Regressão linear múltipla e

Há certas palavras que se ouve apenas nas narrativas, que são palavras que servem de alongador ou temporal que indica o tempo ocorrido, hykhyk, hykhyk, que ouvimos apenas na

A anemia pode afetar dois terços dos Doentes submetidos à cirurgia bariátrica, sendo geralmente provocada pela deficiência de ferro4. Entre os Doentes super-obesos

Quando este tecido é agredido por micro-organismos ou seus bioprodutos, fibroblastos e odontoblastos secretam diversas citocinas inflamatórias para ativação do

10.. incompletas; requerem contextos propícios para formarem um todo. Geralmente, estes contextos surgem fora das paredes destas instituições, algumas semanas, meses ou até mesmo