Métodos Quantitativos dout 2019 - 2

(1)

Econometria aplicada 2018

(2)

Generalising the Simple Model to

Multiple Linear Regression

• _{Before, we have used the model} t = 1,2,...,T

• _{But what if the dependent (y) variable depends on more than one} independent variable?

• _{It is very easy to generalise the simple model to one with k-1 regressors} (independent variables).

t t

t x u

(3)

Multiple Regression and the Constant Term

• _{Now we write}

, t=1,2,...,T

• Where is x₁? It is the constant term. In fact the constant term is usually represented by a column of ones of length T:

₁ is the coefficient attached to the constant term (which we called  before). t kt k t t t

x

u

y





₁





₂ ₂





₃ ₃



...







             1 1 1 1  x

(4)

Different Ways of Expressing

the Multiple Linear Regression Model

• _{We could write out a separate equation for every value of t:}

• _{We can write this in matrix form}

y = X +u where y is T  1 X is T  k  is k  1 u is T  1 T kT k T T T k k k k

u

x

y

u

x

y

u

x

y















...

3 3 2 2 1 2 2 32 3 22 2 1 2 1 1 31 3 21 2 1 1



(5)

Inside the Matrices of the

Multiple Linear Regression Model

• _{e.g. if k is 2, we have 2 parameters, one of which is a column of ones:}

T 1 T2 21 T1

• _{Notice that the matrices written in this way are conformable.}

1 21 1 2 22 1 2 2 2

1

T T T

y

x

u

y

x

u

y

x

u



  



 

  



_{ }

 

  

_



_{ }

_

 

  



_{ }

 

  



 

  



 



(6)

How Do We Calculate the Parameters (the



)

in this Generalised Case?

• _{Previously, we took the residual sum of squares, and minimised it} w.r.t.  and .

• _{In the matrix notation, we have}

• _{The RSS would be given by}

             T u u u u ˆ ˆ ˆ ˆ 2 1 





1 2 2 2 2 2 1 2 1 2 ˆ ˆ ˆ ˆ' ˆ ˆ ˆ_T ˆ ˆ ... ˆ_T ˆ_t u u u u u u u u u u u                





(7)

The OLS Estimator for the

Multiple Regression Model

• In order to obtain the parameter estimates, ₁, ₂,..., _k, we would minimise the RSS with respect to all the s.

• _{It can be shown that}

y X X X k                   2 1 1 ) ( ˆ ˆ ˆ ˆ     

(8)

Derivation of the OLS coefficient

estimator in matrix form

• _{Derivation of the OLS coefficient estimator in matrix form}

2

The linear model in matrix form is:

ˆ

_ˆ

Hence, the residuals are:

ˆ

ˆ ˆ

We want to minimize the residual sum of squares, i.e.

'

Hence

ˆ

ˆ ˆ

'

(

) '(

)

'

' '

'

' '

y

X

u

y X

u

u u L

L u u

y X

y y

X y y X

X X



 







 













(9)

Derivation of the OLS coefficient

estimator in matrix form (cont.)

But

ˆ

_{' '}

_'

ˆ

because

ˆ

_{' '}

_{( '}

ˆ

_{) '}

ˆ ' ' is (1 ) (

) (

1) 1 1

scalar

ˆ

( '

) ' is (1

) (

1) 1 1

scalar

X y

y X

X y

y X

X y

k

k T

T

y X

T

T k

k





       

(10)

Derivation of the OLS coefficient

estimator in matrix form (cont.)

• _{Let us check it out:}

















1 1 2 1 2 1 2 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 1 1 2 1 2 1 1 2 2 2 2 2 1 1 2 2 1 1 2 2

ˆ

_{' '}

_'

ˆ

_since

1

1 ˆ ' '

(

)

(

)

and

1 ˆ

'

1 (

)

(

)

X y

y X

y

X y

x

y

x y

y

x y

x

y X

y

x y

x

y

x y









 







_

_{ }



_

_





 





















_

_

_





_

_

















(11)

Derivation of the OLS coefficient

estimator in matrix form (cont.)

• _{We have already seen that:}

ˆ

ˆ ˆ

'

(

) '(

)

'

' '

'

' '

ˆ

But, since ' '

'

then

'

2 '

' '

Now, we must differentiate:

ˆ

( ' )

( '

)

( ' '

)

2 ˆ

ˆ

L u u

y X

y y

X y y X

X X

X y

y X

L

y y

y X

X X

L

y y

y X

X X



 



 





























(12)

Derivation of the OLS coefficient

estimator in matrix form (cont.)

st

ˆ

( ' )

( '

)

( ' '

)

2 ˆ

ˆ

( ' )

1 term:

0 ˆ

L

y y

y X

X X

y y





_



_



_







(13)

Derivation of the OLS coefficient

estimator in matrix form (cont.)

nd

( '

ˆ

)

[( ' ) ' ]

ˆ

2 term: 2

2 2 '

ˆ

y X

X y







 



1 1 2 2 1 1 2 2 1

Let be a column vector and a column vector . Then we can write:

´

The derivative of ´ w.r.t. vector will be: ( ´ ) ( ( ´ ) n n n n a x a x a x a x a x a x x             _{ } _{ }                     _  a a x x a x a x x a x a x x    1 1 2 2 1 1 1 1 2 2 2 2 2 1 1 2 2 ( ) ´ ) ( ) ( ´ ) ( ) n n n n n n n n n a x a x a x x a a x a x a x a x x a a x a x a x x x            _     _{  }        _{ }  _  _ _ _{  }_ _    _{  }    _{  }_{ }    _{  }           _   _      a x a a x      

(14)

Derivation of the OLS coefficient

estimator in matrix form (cont.)

rd ( ' 'ˆ ˆ)

3 term: _ˆ

ˆ _{' is a 1 k vector, ' is a symmetric square matrix, and is a k 1 vector.}ˆ ˆ ˆ

Hence, ' ' has the same form as ' .

( ' ) But from the matrix chapter, we know that 2

X X X X X X x Ax x Ax Ax x             _  ˆ ˆ ( ' ' ) _ˆ then  X X_ˆ  2 'X X   _ 

(15)

Derivation of the OLS coefficient

estimator in matrix form (cont.)

• _{Hence we can write:}

1 1 1 1 1

ˆ

( ' )

( '

)

( ' '

)

2 ˆ

ˆ

0 2 '

2 '

0 '

'

If we pre-multiply both sides by ( ' ) , we get:

ˆ

( ' ) ( ' )

( ' )

'

But ( ' ) ( ' )

ˆ ( ' )

'

L

y y

y X

X X

X y

X X

X y

X X

X y

X X

I

X X

X y



    



_



_



_



_







 







(16)

Proof that is a minimum

• ₁

st

derivative = β

• ₂

nd

_{derivative =}

• _{If is a minimum, X´X must be a positive definite matrix}

• _{X´X is positive definite iff all elements in the leading}

diagonal > 0 and det(X´X) > 0.

(17)

Proof that is a minimum

• _{The elements in the leading diagonal are 2 and which are}

both positive.

• _{which is positive.}

• _{This can be generalized to a matrix with a T x k size.}

• _{This proves that is the estimator that minimizes u´u.}

• _QED.

(18)

Calculating the Standard Errors for the

Multiple Regression Model

• _{Check the dimensions:} is k  1 as required.

• _{But how do we calculate the standard errors of the coefficient estimates?} • _{Previously, to estimate the variance of the errors,}2_{, we used .}

• _{Now using the matrix notation, we use where k = number of} regressors.

• _{The OLS estimator of the variance of is given by the diagonal elements} of , so that the variance of is the 1st_{element, the variance of}

is the 2nd_{element, and variance of is the k}th_{diagonal element.}





s

2

 

_{T k}

u u

' 

  s X X2( ' )1



₁  ₂ _k 2 ˆ2 2  



T u s t

(19)

Derivation of the OLS standard error

estimator in multiple regression

• _{The variance of a vector of random variables is given by}

the formula E[( −



)( −



)’]. Since and y = X



+ u:

(20)

Derivation of the OLS standard error

estimator in multiple regression (cont.)

= variance-covariance matrix of the coefficients

• The leading diagonal of s2_(X’X)-1 are the variances of the coefficients  0,

₁, ₂,..., _k, and the off-diagonal terms are the covariances between each pair of coefficients, i.e.  with  ;  with  , etc

(21)

Derivation of the OLS standard error estimator

in multiple regression (cont.)

• The standard errors  i = 0, 1, 2, …, k are respectively the

square roots of the i

th

_{element of the leading diagonal of .}

• _{In a 3-regressor equation, we have:}

• _{This is the variance-covariance matrix of the regression}

•

�� (^_�₎₌_�2 (�′ _� )− 1=

(

� �� ( ^�0) �� ( ^�0, ^�1) �� ( ^�0, ^�2) ��( ^�₁, ^�₀) � �� ( ^�₁) �� ( ^�₁, ^�₂) ��( ^�₂, ^�₀) �� ( ^�₂, ^�₁) � �� ( ^�₂)

)

(22)

Calculating Parameter and Standard Error Estimates

for Multiple Regression Models: An Example

• _{Example: The following model with k=3 is estimated over 15 observations:} and the following data have been calculated from the original X’s.

Calculate the coefficient estimates and their standard errors.

• _{To calculate the coefficients, just multiply the matrix by the vector to obtain} .

• _{To calculate the standard errors, we need an estimate of}2_.

( ' ) .. .. .. . . . ,( ' ) .. . , '  . X X    X y u u                     1 2 0 35_{35 10 65}10 10 65 4 3 30 2 2 0 6 10 96

s

2

    

_{T k}

RSS

_{15 3 0 91}

10 96

.

u

x

y





₁





₂ ₂





₃ ₃





X' X



1X' y

(23)

Calculating Parameter and Standard Error Estimates

for Multiple Regression Models: An Example (cont’d)

• _{The variance-covariance matrix of is given by}

• _{The variances are on the leading diagonal:}

• _{We write:}   s X X2 1 0 91 X X 1 320 0 91 594183 320 0 91 0 91 594 393 ( ' ) . ( ' ) .. .. .. . . .  _  _           Var SE Var SE Var SE (  ) . (  ) . (  ) . (  ) . (  ) . (  ) .



1 1 2 2 3 3 183 135 0 91 0 96 3 93 198       



1 .

10  

4 .

40  

19 .

88 

ˆ

x

₂_t

x

₃_t

y







(24)

Testing Multiple Hypotheses: The F-test

• _{We used the t-test to test hypotheses involving only one coefficient. But} what if we want to test more than one coefficient simultaneously?

• _{We do this using the F-test. The F-test involves estimating 2 regressions.} • _{The unrestricted regression is the one in which the coefficients are freely}

determined by the data, as we have done before.

• _{The restricted regression is the one in which the coefficients are} restricted, i.e. the restrictions are imposed on some s.

(25)

The F-test:

Restricted and Unrestricted Regressions

• _Example

The general regression is

y_t = ₁ + ₂x_2t + ₃x_3t + ₄x_4t + u_t (1)

• We want to test the restriction that ₃+₄ = 1. The unrestricted regression is (1) above, but what is the restricted regression?

y_t = ₁ + ₂x_2t + ₃x_3t + ₄x_4t + u_t s.t. ₃+₄ = 1

• We substitute the restriction (₃+₄ = 1) into the regression so that it is automatically imposed on the data.

(26)

The F-test: Forming the Restricted Regression

y_t = ₁ + ₂x_2t + ₃x_3t + (1₃)x_4t + u_t

y_t = ₁ + ₂x_2t + ₃x_3t + x_4t  ₃x_4t + u_t • _{Gather terms in}_{’s together and rearrange}

(y_t  x_4t) = ₁ + ₂x_2t + ₃(x_3t  x_4t) + u_t

• _{This is the restricted regression. We actually estimate it by creating two new} variables, call them, say, P_t and Q_t.

P_t = y_t  x_4t

Q_t = x_3t  x_4t so

(27)

Calculating the F-Test Statistic

• _{The test statistic is given by}

where URSS = RSS from unrestricted regression RRSS = RSS from restricted regression m = number of restrictions

T = number of observations

k = number of regressors in unrestricted regression

including a constant in the unrestricted regression (or the total number of parameters to be estimated).

(28)

The F-Distribution

• _{The test statistic follows the F-distribution, which has 2 d.f.} parameters.

• _{The value of the degrees of freedom parameters are m and (T-k)} respectively (the order of the d.f. parameters is important).

• _{The appropriate critical value will be in column m, row (T-k).}

• _{The F-distribution has only positive values and is not symmetrical. We} therefore only reject the null if the test statistic > critical F-value.

(29)

Determining the Number of Restrictions in an F-test

• _{Examples :}

H₀: hypothesis No. of restrictions, m

₁ + ₂ = 2 1

₂ = 1 and ₃ = -1 2

₂ = 0, ₃ = 0 and ₄ = 0 3

• If the model is y_t = ₁ + ₂x_2t + ₃x_3t + ₄x_4t + u_t, then the null hypothesis

H₀: ₂ = 0, and ₃ = 0 and ₄ = 0 is tested by the regression F-statistic. It tests the null hypothesis that all of the coefficients except the intercept coefficient are zero.

• _{Note the form of the alternative hypothesis for all tests when more than one} restriction is involved: H :   0, or   0 or   0

(30)

What we Cannot Test with Either an F or a t-test

• _{We cannot test using this framework hypotheses which are not linear} or which are multiplicative, e.g.

H₀: ₂₃ = 2 or H₀: ₂2_{= 1}

(31)

The Relationship between the t and the

F-Distributions

• _{Any hypothesis which could be tested with a t-test could have been} tested using an F-test, but not the other way around.

For example, consider the hypothesis H₀: ₂ = 0.5

H₁: ₂  0.5

We could have tested this using the usual t-test:

or it could be tested in the framework above for the F-test. • _{Note that the two tests always give the same result since the}

t-distribution is just a special case of the F-t-distribution.

• For example, if we have some random variable Z, and Z  t_{(T-k) then}

Z2  F(1,T-k) test stat SE    . (  )



2 2 0 5

(32)

F-test Example

• _{Question: Suppose a researcher wants to test whether the returns on a company} stock (y) show unit sensitivity to two factors (factor x₂ and factor x₃) among three considered. The regression is carried out on 144 monthly observations. The

regression is y_t = ₁ + ₂x_2t + ₃x_3t + ₄x_4t+ u_t

- What are the restricted and unrestricted regressions?

- If the two RSS are 436.1 and 397.2 respectively, perform the test. • _Solution:

Unit sensitivity implies H₀: ₂= 1 and ₃= 1. The unrestricted regression is the one in the question. The restricted regression is (y_t x_2t x_3t)= ₁+ ₄x_4t+ u_t or letting

z_t= y_t x_2t x_3t, the restricted regression is z_t= ₁+ ₄x_4t+ u_t

In the F-test formula, T=144, k=4, m=2, RRSS=436.1, URSS=397.2

(33)

Goodness of Fit Statistics

• _{Goodness of fit statistics is a measure of how well the sample regression} function (srf) fits the data.

• _{The most common goodness of fit statistic is known as R}2.

• _{We are interested in explaining the variability of y about its mean value , i.e.} the total sum of squares, TSS:

• _{We can split the TSS into 2 parts, the part which we have explained (known} as the explained sum of squares, ESS) and the part which we did not explain using the model (the RSS).

• 





  t t y y TSS 2

(34)

Defining R

2

• _{That is,} _{TSS = ESS + RSS}

• _{Our goodness of fit statistic is}

• _{But since TSS = ESS + RSS, we can also write}

• _R2_{must always lie between zero and one. To understand this, consider two}

extremes

RSS = TSS i.e. ESS = 0 so R2 = ESS/TSS = 0

R2  ESS_TSS R ESS TSS TSS RSS TSS RSS TSS 2 _ _  _{ }₁

See next slide









_



 



  t t t t t t y y y u y 2 ˆ 2 ˆ2

(35)

Proof

•

(36)

The Limit Cases: R

2

_{= 0 and R}

2

_{= 1}

y_t y x_t y_t x_t

(37)

Problems with R

2

as a Goodness of Fit Measure

• _{There are a number of them:}

1. R2 is defined in terms of variation about the mean of y so that if a model

is reparameterised and the dependent variable changes, R2_{will change.}

2. R2_{never falls if more regressors are added. to the regression, e.g.}

consider:

Regression 1: y_t = ₁ + ₂x_2t + ₃x_3t + u_t

Regression 2: y = ₁ + ₂x_2t + ₃x_3t + ₄x_4t + u_t

R2_{will always be at least as high for regression 2 relative to regression 1.}

3. R2_{quite often takes on values of 0.9 or higher for time series}

(38)

Adjusted R

2

=

• _{Because of (2) in the last slide, a modification is made which takes} into account the loss of degrees of freedom associated with adding extra variables. This is known as or adjusted R2

• _{So if we add an extra regressor, k increases and unless R}2_{increases by}

a more than offsetting amount, will fall.

• _{When we analyse one regression alone, we simply use R}2

• _{We only use when we want to compare an existing regression with} one ore more regressions which are obtained by adding more

regressors to the existing regression, i.e., when we have nested models.

•

    _    1 1 (1 2) 2 _R k T T R