Econometria aplicada 2018
Generalising the Simple Model to
Multiple Linear Regression
• Before, we have used the model t = 1,2,...,T
• But what if the dependent (y) variable depends on more than one independent variable?
• It is very easy to generalise the simple model to one with k-1 regressors (independent variables).
t t
t x u
Multiple Regression and the Constant Term
• Now we write
, t=1,2,...,T
• Where is x1? It is the constant term. In fact the constant term is usually represented by a column of ones of length T:
1 is the coefficient attached to the constant term (which we called before). t kt k t t t
x
x
x
u
y
1
2 2
3 3
...
1 1 1 1 xDifferent Ways of Expressing
the Multiple Linear Regression Model
• We could write out a separate equation for every value of t:
• We can write this in matrix form
y = X +u where y is T 1 X is T k is k 1 u is T 1 T kT k T T T k k k k
u
x
x
x
y
u
x
x
x
y
u
x
x
x
y
...
...
...
3 3 2 2 1 2 2 32 3 22 2 1 2 1 1 31 3 21 2 1 1
Inside the Matrices of the
Multiple Linear Regression Model
• e.g. if k is 2, we have 2 parameters, one of which is a column of ones:
T 1 T2 21 T1
• Notice that the matrices written in this way are conformable.
1 21 1 2 22 1 2 2 2
1
1
1
T T Ty
x
u
y
x
u
y
x
u
How Do We Calculate the Parameters (the
)
in this Generalised Case?
• Previously, we took the residual sum of squares, and minimised it w.r.t. and .
• In the matrix notation, we have
• The RSS would be given by
T u u u u ˆ ˆ ˆ ˆ 2 1
1 2 2 2 2 2 1 2 1 2 ˆ ˆ ˆ ˆ' ˆ ˆ ˆT ˆ ˆ ... ˆT ˆt u u u u u u u u u u u
The OLS Estimator for the
Multiple Regression Model
• In order to obtain the parameter estimates, 1, 2,..., k, we would minimise the RSS with respect to all the s.
• It can be shown that
y X X X k 2 1 1 ) ( ˆ ˆ ˆ ˆ
Derivation of the OLS coefficient
estimator in matrix form
•
Derivation of the OLS coefficient estimator in matrix form
2
The linear model in matrix form is:
ˆ
ˆ
Hence, the residuals are:
ˆ
ˆ
ˆ
ˆ ˆ
We want to minimize the residual sum of squares, i.e.
'
Hence
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ ˆ
'
(
) '(
)
'
' '
'
' '
y
X
u
u
y X
u
u u L
L u u
y X
y X
y y
X y y X
X X
Derivation of the OLS coefficient
estimator in matrix form (cont.)
But
ˆ
' '
'
ˆ
because
ˆ
' '
( '
ˆ
) '
ˆ ' ' is (1 ) (
) (
1) 1 1
scalar
ˆ
( '
) ' is (1
) (
) (
1) 1 1
scalar
X y
y X
X y
y X
X y
k
k T
T
y X
T
T k
k
Derivation of the OLS coefficient
estimator in matrix form (cont.)
•
Let us check it out:
1 1 2 1 2 1 2 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 1 1 2 1 2 1 1 2 2 2 2 2 1 1 2 2 1 1 2 2ˆ
' '
'
ˆ
since
1
1
ˆ ' '
(
)
(
)
and
1
ˆ
'
1
(
)
(
)
X y
y X
y
y
y
X y
x
x
y
x y
x y
y
y
x y
x y
x
y X
y
y
y
y
x y
x y
x
y
y
x y
x y
Derivation of the OLS coefficient
estimator in matrix form (cont.)
•
We have already seen that:
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ ˆ
'
(
) '(
)
'
' '
'
' '
ˆ
ˆ
ˆ
ˆ
ˆ
But, since ' '
'
then
'
2 '
' '
Now, we must differentiate:
ˆ
ˆ
ˆ
( ' )
( '
)
( ' '
)
2
ˆ
ˆ
ˆ
ˆ
L u u
y X
y X
y y
X y y X
X X
X y
y X
L
y y
y X
X X
L
y y
y X
X X
Derivation of the OLS coefficient
estimator in matrix form (cont.)
st
ˆ
ˆ
ˆ
( ' )
( '
)
( ' '
)
2
ˆ
ˆ
ˆ
ˆ
( ' )
1 term:
0
ˆ
L
y y
y X
X X
y y
Derivation of the OLS coefficient
estimator in matrix form (cont.)
nd
( '
ˆ
)
[( ' ) ' ]
ˆ
2 term: 2
2
2 '
ˆ
ˆ
y X
X y
X y
1 1 2 2 1 1 2 2 1Let be a column vector and a column vector . Then we can write:
´
The derivative of ´ w.r.t. vector will be: ( ´ ) ( ( ´ ) n n n n a x a x a x a x a x a x x a a x x a x a x x a x a x x 1 1 2 2 1 1 1 1 2 2 2 2 2 1 1 2 2 ( ) ´ ) ( ) ( ´ ) ( ) n n n n n n n n n a x a x a x x a a x a x a x a x x a a x a x a x x x a x a a x
Derivation of the OLS coefficient
estimator in matrix form (cont.)
rd ( ' 'ˆ ˆ)
3 term: ˆ
ˆ ' is a 1 k vector, ' is a symmetric square matrix, and is a k 1 vector.ˆ ˆ ˆ
Hence, ' ' has the same form as ' .
( ' ) But from the matrix chapter, we know that 2
X X X X X X x Ax x Ax Ax x ˆ ˆ ( ' ' ) ˆ then X Xˆ 2 'X X
Derivation of the OLS coefficient
estimator in matrix form (cont.)
•
Hence we can write:
1 1 1 1 1
ˆ
ˆ
ˆ
( ' )
( '
)
( ' '
)
2
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
0 2 '
2 '
0
'
'
If we pre-multiply both sides by ( ' ) , we get:
ˆ
( ' ) ( ' )
( ' )
'
But ( ' ) ( ' )
ˆ ( ' )
'
L
y y
y X
X X
X y
X X
X X
X y
X X
X X
X X
X X
X y
X X
X X
I
X X
X y
Proof that is a minimum
•
1
stderivative = β
•
2
ndderivative =
•
If is a minimum, X´X must be a positive definite matrix
•
X´X is positive definite iff all elements in the leading
diagonal > 0 and det(X´X) > 0.
Proof that is a minimum
•
The elements in the leading diagonal are 2 and which are
both positive.
•
which is positive.
•
This can be generalized to a matrix with a T x k size.
•
This proves that is the estimator that minimizes u´u.
•
QED.
Calculating the Standard Errors for the
Multiple Regression Model
• Check the dimensions: is k 1 as required.
• But how do we calculate the standard errors of the coefficient estimates? • Previously, to estimate the variance of the errors, 2, we used .
• Now using the matrix notation, we use where k = number of regressors.
• The OLS estimator of the variance of is given by the diagonal elements of , so that the variance of is the 1st element, the variance of
is the 2nd element, and variance of is the kth diagonal element.
s
2
T k
u u
'
s X X2( ' )1
1 2 k 2 ˆ2 2
T u s tDerivation of the OLS standard error
estimator in multiple regression
•
The variance of a vector of random variables is given by
the formula E[( −
)( −
)’]. Since and y = X
+ u:
Derivation of the OLS standard error
estimator in multiple regression (cont.)
= variance-covariance matrix of the coefficients
• The leading diagonal of s2(X’X)-1 are the variances of the coefficients 0,
1, 2,..., k, and the off-diagonal terms are the covariances between each pair of coefficients, i.e. with ; with , etc
Derivation of the OLS standard error estimator
in multiple regression (cont.)
•
The standard errors i = 0, 1, 2, …, k are respectively the
square roots of the i
thelement of the leading diagonal of .
•
In a 3-regressor equation, we have:
•
This is the variance-covariance matrix of the regression
•
��� (^�)=�2 (�′ � )− 1=
(
� �� ( ^�0) ��� ( ^�0, ^�1) ��� ( ^�0, ^�2) ���( ^�1, ^�0) � �� ( ^�1) ��� ( ^�1, ^�2) ���( ^�2, ^�0) ��� ( ^�2, ^�1) � �� ( ^�2))
Calculating Parameter and Standard Error Estimates
for Multiple Regression Models: An Example
• Example: The following model with k=3 is estimated over 15 observations: and the following data have been calculated from the original X’s.
Calculate the coefficient estimates and their standard errors.
• To calculate the coefficients, just multiply the matrix by the vector to obtain .
• To calculate the standard errors, we need an estimate of 2.
( ' ) .. .. .. . . . ,( ' ) .. . , ' . X X X y u u 1 2 0 3535 10 6510 10 65 4 3 30 2 2 0 6 10 96
s
2
T k
RSS
15 3 0 91
10 96
.
.
u
x
x
y
1
2 2
3 3
X' X
1X' yCalculating Parameter and Standard Error Estimates
for Multiple Regression Models: An Example (cont’d)
• The variance-covariance matrix of is given by
• The variances are on the leading diagonal:
• We write: s X X2 1 0 91 X X 1 320 0 91 594183 320 0 91 0 91 594 393 ( ' ) . ( ' ) .. .. .. . . . Var SE Var SE Var SE ( ) . ( ) . ( ) . ( ) . ( ) . ( ) .
1 1 2 2 3 3 183 135 0 91 0 96 3 93 198
1
.
10
4
.
40
19
.
88
ˆ
x
2tx
3ty
Testing Multiple Hypotheses: The F-test
• We used the t-test to test hypotheses involving only one coefficient. But what if we want to test more than one coefficient simultaneously?
• We do this using the F-test. The F-test involves estimating 2 regressions. • The unrestricted regression is the one in which the coefficients are freely
determined by the data, as we have done before.
• The restricted regression is the one in which the coefficients are restricted, i.e. the restrictions are imposed on some s.
The F-test:
Restricted and Unrestricted Regressions
• Example
The general regression is
yt = 1 + 2x2t + 3x3t + 4x4t + ut (1)
• We want to test the restriction that 3+4 = 1. The unrestricted regression is (1) above, but what is the restricted regression?
yt = 1 + 2x2t + 3x3t + 4x4t + ut s.t. 3+4 = 1
• We substitute the restriction (3+4 = 1) into the regression so that it is automatically imposed on the data.
The F-test: Forming the Restricted Regression
yt = 1 + 2x2t + 3x3t + (13)x4t + ut
yt = 1 + 2x2t + 3x3t + x4t 3x4t + ut • Gather terms in ’s together and rearrange
(yt x4t) = 1 + 2x2t + 3(x3t x4t) + ut
• This is the restricted regression. We actually estimate it by creating two new variables, call them, say, Pt and Qt.
Pt = yt x4t
Qt = x3t x4t so
Calculating the F-Test Statistic
• The test statistic is given by
where URSS = RSS from unrestricted regression RRSS = RSS from restricted regression m = number of restrictions
T = number of observations
k = number of regressors in unrestricted regression
including a constant in the unrestricted regression (or the total number of parameters to be estimated).
The F-Distribution
• The test statistic follows the F-distribution, which has 2 d.f. parameters.
• The value of the degrees of freedom parameters are m and (T-k) respectively (the order of the d.f. parameters is important).
• The appropriate critical value will be in column m, row (T-k).
• The F-distribution has only positive values and is not symmetrical. We therefore only reject the null if the test statistic > critical F-value.
Determining the Number of Restrictions in an F-test
• Examples :
H0: hypothesis No. of restrictions, m
1 + 2 = 2 1
2 = 1 and 3 = -1 2
2 = 0, 3 = 0 and 4 = 0 3
• If the model is yt = 1 + 2x2t + 3x3t + 4x4t + ut, then the null hypothesis
H0: 2 = 0, and 3 = 0 and 4 = 0 is tested by the regression F-statistic. It tests the null hypothesis that all of the coefficients except the intercept coefficient are zero.
• Note the form of the alternative hypothesis for all tests when more than one restriction is involved: H : 0, or 0 or 0
What we Cannot Test with Either an F or a t-test
• We cannot test using this framework hypotheses which are not linear or which are multiplicative, e.g.
H0: 2 3 = 2 or H0: 2 2= 1
The Relationship between the t and the
F-Distributions
• Any hypothesis which could be tested with a t-test could have been tested using an F-test, but not the other way around.
For example, consider the hypothesis H0: 2 = 0.5
H1: 2 0.5
We could have tested this using the usual t-test:
or it could be tested in the framework above for the F-test. • Note that the two tests always give the same result since the
t-distribution is just a special case of the F-t-distribution.
• For example, if we have some random variable Z, and Z t(T-k) then
Z2 F(1,T-k) test stat SE . ( )
2 2 0 5F-test Example
• Question: Suppose a researcher wants to test whether the returns on a company stock (y) show unit sensitivity to two factors (factor x2 and factor x3) among three considered. The regression is carried out on 144 monthly observations. The
regression is yt = 1 + 2x2t + 3x3t + 4x4t+ ut
- What are the restricted and unrestricted regressions?
- If the two RSS are 436.1 and 397.2 respectively, perform the test. • Solution:
Unit sensitivity implies H0: 2 = 1 and 3 = 1. The unrestricted regression is the one in the question. The restricted regression is (yt x2t x3t)= 1 + 4x4t + ut or letting
zt = yt x2t x3t, the restricted regression is zt = 1 + 4x4t + ut
In the F-test formula, T=144, k=4, m=2, RRSS=436.1, URSS=397.2
Goodness of Fit Statistics
• Goodness of fit statistics is a measure of how well the sample regression function (srf) fits the data.
• The most common goodness of fit statistic is known as R2.
• We are interested in explaining the variability of y about its mean value , i.e. the total sum of squares, TSS:
• We can split the TSS into 2 parts, the part which we have explained (known as the explained sum of squares, ESS) and the part which we did not explain using the model (the RSS).
•
t t y y TSS 2Defining R
2• That is, TSS = ESS + RSS
• Our goodness of fit statistic is
• But since TSS = ESS + RSS, we can also write
• R2 must always lie between zero and one. To understand this, consider two
extremes
RSS = TSS i.e. ESS = 0 so R2 = ESS/TSS = 0
R2 ESSTSS R ESS TSS TSS RSS TSS RSS TSS 2 1
See next slide
t t t t t t y y y u y 2 ˆ 2 ˆ2Proof
•
The Limit Cases: R
2= 0 and R
2= 1
yt y xt yt xtProblems with R
2as a Goodness of Fit Measure
• There are a number of them:
1. R2 is defined in terms of variation about the mean of y so that if a model
is reparameterised and the dependent variable changes, R2 will change.
2. R2 never falls if more regressors are added. to the regression, e.g.
consider:
Regression 1: yt = 1 + 2x2t + 3x3t + ut
Regression 2: y = 1 + 2x2t + 3x3t + 4x4t + ut
R2 will always be at least as high for regression 2 relative to regression 1.
3. R2 quite often takes on values of 0.9 or higher for time series
Adjusted R
2=
• Because of (2) in the last slide, a modification is made which takes into account the loss of degrees of freedom associated with adding extra variables. This is known as or adjusted R2
• So if we add an extra regressor, k increases and unless R2 increases by
a more than offsetting amount, will fall.
• When we analyse one regression alone, we simply use R2
• We only use when we want to compare an existing regression with one ore more regressions which are obtained by adding more
regressors to the existing regression, i.e., when we have nested models.