Metodos Quantitativos dout 2019 3

(1)

Chapter 5

• Classical linear regression

model assumptions and

diagnostic tests

(2)

Violation of the Assumptions of the CLRM

• Recall that we assumed of the CLRM disturbance terms: 1. E(u_t) = 0

2. Var(u_t) = 2 < 

3. Cov (u_i,u_j) = 0

4. The X matrix is non-stochastic or fixed in repeated samples 5. u_t  N(0,2₎

(3)

Investigating Violations of the

Assumptions of the CLRM

• We will now study these assumptions further, and in particular look at: - How we test for violations

- Causes

- Consequences

in general we could encounter any combination of 3 problems: - the coefficient estimates are wrong

- the associated standard errors are wrong - the distribution that we assumed for the test statistics will be inappropriate

- Solutions

- the assumptions are no longer violated - we work around the problem so that we use alternative techniques which are still valid

(4)

Statistical Distributions for Diagnostic Tests

• Often, an F- and a 2_{- version of the test are available.}

• The F-test version involves estimating a restricted and an unrestricted version of a test regression and comparing the RSS.

• The 2_{- version is sometimes called an “LM” test, and only has one degree}

of freedom parameter: the number of restrictions being tested, m.

• Asymptotically, the 2 tests are equivalent since the 2_{is a special case of}

the F-distribution:

 

_ _F

_

_m _T __k

_

_T __k __ m m as , 2 

(5)

Assumption 1: E(u

_t

) = 0

• _{Assumption that the mean of the disturbances is zero.}

• _{For all diagnostic tests, we cannot observe the disturbances and so} perform the tests of the residuals.

• _{The mean of the residuals will always be zero provided that there is a} constant term in the regression.

(6)

Assumption 1: E(u

_t

) = 0

1 11 12 1 2 1 2 1 2 1

Proof in matrix form:

'

0 '(

) 0 '

0

1

0

0 The first element of the vector ' will be:

...

0 (

T k k kT T T T t t

X X

X y

X y X

X u

u

x

u

x

u

X u

u

E u





















   



   



   

_



   



   

 







 





 







) 0



(7)

The Variance of

1 1 1 1 1 1 1 1 1 1 1 ˆ ( ' ) ' ( ' ) '( ) ( ' ) ' ( ' ) ' ( ' ) ' ˆ ˆ ˆ var( ) [( )( ) '] [( ( ' ) ' )( ( ' ) ' ) '] [(( ' ) ' )(( ' ) ' ) '] [( ' ) ' ' ( ' ) ] If assumption 4 hol X X X y X X X X u X X X X X X X u X X X u E E X X X u X X X u E X X X u X X X u E X X X uu X X X                                            1 1 2 2 1 1 2 1 1 2 2 2 2 1 2 2 1 2 ds (X is non-stochastic), then ˆ var( ) ( ' ) ' ( ') ( ' )

If we had a regression with 2 observations, we could write: ( ) ( ) ( ') ( ) ( ) If assumption 1 hol X X X E uu X X X u u u E u E u u E uu E u u u E u u E u  _        _ _{ } _     2 1 1 2 2 2 1 2 cov( ) ds ( ( ) 0), then ( ') cov( ) Why? u u E u E uu u u      _{ } _  

(8)

The Variance of

2 1 1 2 2 2 1 2 2 2 1 1 2 cov( ) Again, if we had T = 2, then: ( ')

cov( ) If assumptions 1 and 4 hold, there are 4 possibilities:

1) Assumptions 2 ( ) and 3 ( cov( 0) hold, then cov( ( ') u i j u u E uu u u const u u u u E uu               2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 1 2 2 1 ) 0 cov( ) 0

2) Assumption 2 holds but assumption 3 doesn't hold, then

1 cov( ) / cov( ) / ( ') 1 cov( ) cov( ) / / 3) Assumption I u u u u u u E uu u u u u  _                                    _ _  _ _ _ _       2 2 1 1 2 1 2 2 2 1 2 2

3 holds but assumption 2 doesn't hold, then

cov( ) 0

( ')

cov( ) 0

4) Assumptions 2 and 3 don't hold, then

u u E uu u u          _ _{ } _    

(9)

The Variance of

• Let us analyze the 4 cases:

1. E(uu’) = 

2

I

 the Gauss-Markov theorem proves that this is

the minimum variance

2.  not minimum variance

3.  not minimum variance

4.  not minimum variance

• Conclusion: only when assumptions 1 to 4 hold the

variance-covariance matrix will be minimum (BLUE).

(10)

Assumption 2: Var(u

_t

) =



2

<



• We have so far assumed that the variance of the errors is constant, 2_{- this}

is known as homoscedasticity. If the errors do not have a constant variance, we say that they are heteroscedastic e.g. say we estimate a regression and calculate the residuals, .

_u

_t uˆ +_t

t

(11)

Detection of Heteroscedasticity

• Graphical methods • Formal tests:

One of the best is White’s general test for heteroscedasticity. The test is carried out as follows:

1. Assume that the regression we carried out is as follows

y_t = ₁ + ₂x_2t + ₃x_3t + u_t

And we want to test Var(u_t) = 2_{. We estimate the model, obtaining the}

residuals,

2. Then run the auxiliary regression

u

t t t t t t t t t

x

v

u

ˆ

2





₁





₂ ₂





₃ ₃





₄ ₂2





₅ ₃2





₆ ₂ ₃



(12)

Performing White’s Test for Heteroscedasticity

3. Obtain R2_{from the auxiliary regression and multiply it by the number of}

observations, T. It can be shown that

T R2  2_(m)

where m is the number of regressors in the auxiliary regression excluding the constant term.

4. If the 2_{test statistic from step 3 is greater than the corresponding value}

from the statistical table then reject the null hypothesis that the disturbances are homoscedastic.

(13)

Performing White’s Test for Heteroscedasticity

• Example 4.1, page 135: T = 120

– H

0

: Var(u

t

) = 

2

– R

2

of auxiliary regression = 0.234

– Test statistic = 0.234 x 120 = 28.8

– Critical value of 

2

5), 5% = 11.07

– Reject H

0

•

See Eviews example in page 137

1 2 2 3 3

t t t t

(14)

White’s test exercise

• Given the model

a regression with 40 observations was performed.

Considering that the auxiliary regression for the

execution of the White’s test got R

2

= 0.585, is there

evidence of heteroscedasticity at 5%?

Solution: TR

2

= 0.585x40 = 23.4; m = 14; critical

value = 23.685; do not reject H

₀

, there is no evidence

of heteroscedasticity.

0 1 1 2 2 3 3 4 4

t t t t t t

(15)

Consequences of Using OLS in the Presence of Heteroscedasticity

• From Chapter 3 (slide14):

• Derivation of the OLS standard error estimator:

1 1 1 1 1 1 1 1 1 1 1 ˆ ( ' ) ' ( ' ) '( ) ( ' ) ' ( ' ) ' ( ' ) ' ˆ ˆ ˆ var( ) [( )( ) '] [( ( ' ) ' )( ( ' ) ' ) '] [(( ' ) ' )(( ' ) ' ) '] [( ' ) ' ' ( ' ) ] Since by assumption X X X y X X X X u X X X X X X X u X X X u E E X X X u X X X u E X X X u X X X u E X X X uu X X X                                            1 1 2 1 2 1 2 1 X is non-stochastic, then ˆ var( ) ( ' ) ' ( ') ( ' ) ˆ ˆ ˆ ˆ But var( ) ( ') ( ) ( ' ) ' ( ' ) ˆ ( ) ( ' ) X X X E uu X X X u E uu s I Var X X X s IX X X Var s X X               

(16)

Consequences of Using OLS in the Presence of Heteroscedasticity 2 2 2 2 2 2 2 2 0 0 0 0 0 0 ( ) ( ') 0 0 0 0 0 0 and 0 0 0 0 0 0 ˆ ˆ ˆ ( ) ( ') 0 0 0 0 0 0 Var u E uu I s s Var u E uu s I s             _ _                _ _         

(17)

•

. 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 2 1 2 2

However, if heteroscedasticity is present:

0

0 ( )

(

')

, where

0

0 and

0

0 ˆ

ˆ

ˆ ˆ

( )

(

')

, where

0

T T T T

Var u

E uu

s

Var u

E uu

s

















  

_

_





























  

_

_























(18)

• Hence, if heteroscedasticity exists





1





1

1 2

( )

ˆ

(

)

(19)

• The Gauss-Markov theorem shows that the OLS estimator is

efficient, which means that under homoscedasticity (and no

serial correlation) var() is the minimum variance (See Heij et

al, page 98).

• So, under heteroscedasticity var() is not the minimum

variance, i.e. var() is biased upwards (and so is).

• Since SE() is in the denominator of the t-statistic , this will

be biased downwards.

• Therefore, it will become more difficult to reject H

0

, i.e. the

probability of committing a type II error (not rejecting a false

H

0

) will increase.

(20)

OLS standard error estimator with

heteroscedasticity

• If there is heteroscedasticity, E(u

₁2

)  E(u

22

)

 ....  E(u

_T2

), which means that s

12

 s

22

 s

T2

(21)

Other Approaches to Dealing

with Heteroscedasticity

• White (1980) has shown that

is a consistent estimator of

• White, H. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica, vol. 48, no. 4, 1980, pp. 817–838. http://staff.utia.cas.cz/barunik/files/AE/Seminar8/White1980.pdf 2 1 2 2 2 2 ˆ 0 0 0 ˆ 0 0 0 ˆ

where , 1, 2,..., , are the OLS residuals;

0 0 0 ˆ 0 0 0 t T u u D u t T u        _ _         2 2 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 _T s s s                 

(22)

Other Approaches to Dealing

with Heteroscedasticity

• Hence,

is a consistent estimator of

• This is called the White’s Heteroscedasticity

Consistent Estimator (HEC) estimator

• The SE’s obtained are called robust standard

errors

• 



1 1

ˆ

var

( )





(

X X

’

)



X DX X X

’



(23)

The HEC var-cov matrix with DF adjustment

• In small samples, a degrees of freedom adjustment

is necessary, as follows:

2 1 2 2 2

ˆ

0

0 ˆ

0

0 ˆ

0

_T

u

T

D

T k

u















_

_

















(24)

HEC Matrix example

Dependent Variable: RIBOV Method: Least Squares Date: 03/27/18 Time: 17:52

Sample (adjusted): 2000M03 2016M12 Included observations: 202 after adjustments

White heteroskedasticity-consistent standard errors & covariance

Variable Coefficient Std. Error t-Statistic Prob. C 0.007293 0.004396 1.659109 0.0987 VP_PU -0.000123 5.59E-05 -2.206356 0.0285 D(EMBI) -0.000321 5.42E-05 -5.917603 0.0000 R-squared 0.305333 Mean dependent var 0.006084 Adjusted R-squared 0.298351 S.D. dependent var 0.072922 S.E. of regression 0.061083 Akaike info criterion -2.738421 Sum squared resid 0.742499 Schwarz criterion -2.689289 Log likelihood 279.5806 Hannan-Quinn criter. -2.718542 F-statistic 43.73402 Durbin-Watson stat 1.871069 Prob(F-statistic) 0.000000 Wald F-statistic 24.33457 Prob(Wald F-statistic) 0.000000

Dependent Variable: RIBOV Method: Least Squares Date: 04/04/18 Time: 19:17

Sample (adjusted): 2000M03 2016M12 Included observations: 202 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

C 0.007293 0.004418 1.650704 0.1004

VP_PU -0.000123 7.07E-05 -1.744174 0.0827

D(EMBI) -0.000321 3.50E-05 -9.147480 0.0000

R-squared 0.305333 Mean dependent var 0.006084

Adjusted R-squared 0.298351 S.D. dependent var 0.072922

S.E. of regression 0.061083 Akaike info criterion -2.738421

Sum squared resid 0.742499 Schwarz criterion -2.689289

Log likelihood 279.5806 Hannan-Quinn criter. -2.718542

F-statistic 43.73402 Durbin-Watson stat 1.871069

(25)

Background –

The Concept of a Lagged Value

t yt yt-1 yt 1989M09 0.8 -1989M10 1.3 0.8 1.3-0.8=0.5 1989M11 -0.9 1.3 -0.9-1.3=-2.2 1989M12 0.2 -0.9 0.2-0.9=1.1 1990M01 -1.7 0.2 -1.7-0.2=-1.9 1990M02 2.3 -1.7 2.3-1.7=4.0 1990M03 0.1 2.3 0.1-2.3=-2.2 1990M04 0.0 0.1 0.0-0.1=-0.1

. . . .

(26)

Autocorrelation

• We assumed of the CLRM’s errors that Cov(u_i , u_j) = 0 for  i  j, i.e.

This is the same as saying there is no pattern in the errors.

• Obviously we never have the actual u’s, so we use their sample counterpart, the residuals (the ’s).

• If there are patterns in the residuals from a model, we say that they are autocorrelated.

• Some stereotypical patterns we may find in the residuals are given on the next 3 slides.

(27)

Positive Autocorrelation

Positive Autocorrelation is indicated by a cyclical residual plot over time.

+ -t uˆ + 1 ˆ_t_ u -3.7 -6 -6.5 -6 -3.1 -5 -3 0.5 -1 1 4 3 5 7 8 7 + -Time t uˆ

(28)

Negative Autocorrelation

Negative autocorrelation is indicated by an alternating pattern where the residuals

+ -t uˆ + 1 ˆt u + -t uˆ Time

(29)

No pattern in residuals –

No autocorrelation

No pattern in residuals at all: this is what we would like to see

+ t uˆ -+ 1 ˆ_t_ u + -t uˆ Time

(30)

Detecting Autocorrelation:

The Durbin-Watson Test

The Durbin-Watson (DW) is a test for first order autocorrelation - i.e. it assumes that the relationship is between an error and the previous one

u_t = u_t-1 + v_t (1) where v_t  N(0, _v2_).

• The DW test statistic actually tests

H₀: =0 and H₁: 0 • The test statistic is calculated by





DW u u u t t t T t t T   _  



   1 2 2 2 1

(31)

Detecting Autocorrelation:

The Durbin-Watson Test

2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 1 1 2 1 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2

(

)

2 ...

;

...

If T is large,

(

)

2

T T T T t t t t t t t t t t T T t T t T t t T T T T T T T t t t t t t t t t t t t t t t t t

u

u u

u

u u

u

u u

                     













 





 

















_







 



2 1 ₁ 2 2 2 2 2 1 1

2 2 1

T T _T t t t _{t t} t t t T T t t t t

u

u u

_{u u}

DW

u

 _     







_



_

_





_

_





_

_





















(32)

Detecting Autocorrelation:

The Durbin-Watson Test









1 1 1 1 1 1 2 2 1 1

cov( ,

)

(

( ))(

(

))

( )(

)

1 _{ˆ ˆ}

Sample estimate of cov( ,

)

1

1 ˆ

Sample estimate of var( )

1 ˆ ˆ

(

1)cov( ,

)

ˆ

2 1

2(1

)

ˆ

(

1) var( )

t t t t t t t t T t t t t t T t t t t t t

u u

E u

u

E u

E u u

u u

T

u

T

u u

DW

T

u



        















_





_



_











(33)

The Durbin-Watson Test:

Critical Values

• We can also write

(2)

where is the estimated correlation coefficient. Since is a correlation, it implies that .

• Rearranging for DW from (2) would give 0  DW  4.

• If = 0, DW = 2. So, do not reject the null hypothesis if DW  2 because there is little evidence of autocorrelation

• DW has 2 critical values, an upper critical value (d_u) and a lower critical value (d_L), and there is also an intermediate region where we can neither reject nor not reject H₀.

•

DW 2 1(  _)



(34)

The Durbin-Watson Test: Interpreting the Results

Conditions which Must be Fulfilled for DW to be a Valid Test 1. Constant term in regression

(35)

Durbin-Watson test exercise

• An OLS regression produced

regression residuals given by:

• Calculate the DW statistic and perform the

Durbin-Watson test. For a significance level of 5%, T = 5 and 2

parameters excluding the constant term use d

L

= 0.467

and d

U

= 1.896. What is the test result at the 5% level?

• Solution: DW=3.991; d

L

=0.467; d

U

=1.896; 4–d

L

=3.531;

4–d

U

=2.103.

• There is evidence of negative autocorrelation.

1 2 2 3 3

t t t

y



 



x





x



u

( 2.2 0.5 0.5 1.5 0.7)'

(36)

Another Test for Autocorrelation: The Breusch-Godfrey Test*

•It is a more general test for rth order autocorrelation, where:

, vt  N(0, 2)

•The null and alternative hypotheses are: H0 :



1 = 0 and



2 = 0 and ... and



r = 0

H1 :



1  0 or



2  0 or ... or



r  0

•The test is carried out as follows:

1. Estimate the linear regression using OLS and obtain the residuals.

2. Regress the residuals on all of the regressors from stage 1 (the x’s) plus lagged residuals up to tr, i.e. and obtain R2 from this regression. The

number of lags of ut can be decided using information criteria.

3. It can be shown that (T-r)R2 



2(r).

(37)

Breusch-Godfrey exercise

• Let a regression be

with T = 40, and we suspect that the residuals

have a serial correlation of the 3

rd

_order.

• Carry out a B-G test at 5%.

1 2 2 3 3

(1)

t t t

(38)

Breusch-Godfrey exercise

• Solution:

• Run the regression (1) and get the residuals.

• Specify the auxiliary regression as

• Run regression (2). Assume that its R

2

= 0.25.

• The test statistic = (40 – 3)*0.25 = 9.25

• Critical value (5%, r = 3) = 7.815

1 2 2 3 3 1 1

+

2 2

+

3 3

(2)

t t t t t t

(39)

Breusch-Godfrey exercise

• Assume that when we were looking up at the

Chi

2

_{table we chose 0.005 instead of 0.05 and}

then we performed the test.

• What happens then?

(40)

Breusch-Godfrey exercise

• If we are not confident about the SC order, we

can run (2) with different lags and choose the

one with the lower AIC or SBIC and the

(41)

Consequences of Ignoring Autocorrelation if it is Present

• The coefficient estimates derived using OLS are still unbiased, but they are inefficient, i.e. they are not BLUE, even in large sample sizes.

• Thus, if the standard error estimates are inappropriate, there exists the possibility that we could make the wrong inferences (Type II error = false negative)

• R2_{is likely to be inflated relative to its “correct” value for positively}

(42)

Consequences of Ignoring Autocorrelation

if it is Present

• When autocorrelation exists, the matrix  is

• So that:

1 1 1 1 2

1

0

1

T T T



  













  





















(43)

“Remedies” for Autocorrelation

• If the form of the autocorrelation is known, we could use a GLS procedure – i.e. an approach that allows for autocorrelated residuals e.g., Cochrane-Orcutt. See Brooks Ch. 5.

(44)

The Cochrane-Orcutt approach

• Let a regression be:

• Applying one lag:

• Multiplying by :

(45)

The Cochrane-Orcutt approach

• Rearranging terms and noticing that :

• Making

• We get:

• This is a GLS equation and it can be estimated by OLS

since v

t

is free from SC.

• But we need an estimate of



, i.e. be obtained from

(46)

The Cochrane-Orcutt approach: summary

• See: Cochrane, D. and Orcutt, G. H. (1949) Application of Least Squares

(47)

Solution to serial correlated (and heteroscedastic)

residuals in regressions

• As we have seen previously, untreated residual serial

correlation provokes biased coefficient standard

errors, which can lead to inference errors (Type I or

II) and to the reduction of the power of the t-test.

• A solution was proposed by Newey and West (1987):

“A Simple, Positive Semi-Definite, Heteroskedasticity

and Autocorrelation Consistent Covariance Matrix”,

Econometrica, 55/3, available in:

https://docs.google.com/viewer?a=v&pid=site

s&srcid=ZGVmYXVsdGRvbWFpbnxvdGF2aW9tZWRlaXJ

vczIwMDl8Z3g6MmY0ZDliMDdkNWYwYjlmNQ

(48)

Multicollinearity

• This problem occurs when the explanatory variables are very highly correlated with each other.

• Perfect multicollinearity

Cannot estimate all the coefficients - e.g. suppose x₃ = 2x₂

and the model is y_t = ₁ + ₂x_2t + ₃x_3t + ₄x_4t + u_t

• Problems if Near Multicollinearity is Present but Ignored

- R2_{will be high but the individual coefficients will have high standard errors.}

- The regression becomes very sensitive to small changes in the specification. - Thus confidence intervals for the parameters will be very wide, and

(49)

Measuring Multicollinearity

• The easiest way to measure the extent of multicollinearity is simply to look at the matrix of correlations between the individual variables. e.g.

• But another problem: if 3 or more variables are linear - e.g. x_2t + x_3t = x_4t

• Note that high correlation between y and one of the x’s is not multicollinearity.

Corr

x

2

x

3

x

4

x

₂

-

0.2

0.8 x

3

0.2 -

0.3

(50)

-Measuring Multicollinearity

• Another criterion to check for multicollinearity:

– VIF – Variance inflation factor

– Compute all VIF’s.

– If one or more VIF > 10, there are multicollinearity

problems

2 2 2 1 ( ) 1

where R R of the regression of independent variable X on all other regressors.

j j j j VIF X R   

(51)

Measuring Multicollinearity

• How do we interpret the variance inflation factors for a

regression model?

• It is a measure of how much the variance of the estimated

regression coefficient b

k

is "inflated" by the existence of

correlation among the predictor variables in the model.

• A VIF of 1 means that there is no correlation among

the k

th

predictor and the remaining predictor variables, and

hence the variance of b

_k

is not inflated at all.

• The general rule of thumb is that VIFs exceeding 4 warrant

further investigation, while VIFs exceeding 10 are signs of

serious multicollinearity requiring correction.

(52)

Micronumerosity

• Micronumerosity: A term introduced by Goldberger

(1964) to describe properties of econometric estimators

with small sample sizes.

• Exact micronumerosity: fewer observations than

parameters to be estimated: estimation is not possible.

• Near micronumerosity: the number of observations

barely exceeds the number of parameters to be

(53)

Micronumerosity

• What size should the sample have



– Minimum T-k  30

– Ideal T-k  100

• What can be done

– Increase sample size;

– Re-analyse theory and model and reduce number

of coefficients if possible.

(54)

Testing the Normality Assumption

• Why did we need to assume normality for hypothesis testing? Testing for Departures from Normality

• The Bera Jarque normality test

• A normal distribution is not skewed and is defined to have a coefficient of kurtosis of 3.

• The kurtosis of the normal distribution is 3 so its excess kurtosis (b₂-3) is zero.

• Skewness and kurtosis are the (standardised) third and fourth moments of a distribution.

(55)

Normal versus Skewed Distributions

A normal distribution A skewed distribution

f(x)

x x

(56)

Leptokurtic versus Normal Distribution

0.1 0.2 0.3 0.4 0.5

(57)

Testing for Normality

• Bera and Jarque formalise this by testing the residuals for normality by testing whether the coefficient of skewness and the coefficient of excess kurtosis are jointly zero.

• It can be proved that the coefficients of skewness and kurtosis can be expressed as:

and

• The Bera Jarque test statistic is given by

• We estimate b₁ and b₂ using the residuals from the OLS regression, .

• See page 181. _u

 

b₁ E u3 2 3 2  [ ]_/  b

 

E u 2 4 2 2  [ ] 





_~

_{ }

₂ 24 3 6 2 2 2 2 1        _  T b b W

(58)

What do we do if we find evidence of Non-Normality?

• It is not obvious what we should do!

• Could use a method which does not assume normality, but difficult and what are its properties?

• Often the case that one or two very extreme residuals causes us to reject the normality assumption.

• An alternative is to use dummy variables.

e.g. say we estimate a monthly model of asset returns from 1980-1990, and we plot the residuals, and find a particularly large outlier for October

(59)

What do we do if we find evidence

of Non-Normality? (cont’d)

• Create a new variable:

D87M10t = 1 during October 1987 and zero otherwise.

This effectively knocks out that observation. But we need a theoretical reason for adding dummy variables. See page 186.

Oct 1987 + -Time t uˆ

(60)

Normality test exercise

• A linear regression

produced residuals given by .

The variance of the residuals is equal to 8/3. Compute

the Jarque-Bera statistic (W) and perform the

normality test.

• Solution:

E(u

3

) = 1.2; E(u

4

) = 4; b1 = 0.275568; b2 = 0.5625; W

= 1.301074; Crit. Value (5%) = 5.991  Do not reject

H

 There is evidence that the residuals are normal.

1 2 2 3 3

t t t

y



 



x





x



u

' (2 1 1 1 1)

(61)

Parameter Stability Tests

• So far, we have estimated regressions such as

• We have implicitly assumed that the parameters (₁, ₂ and ₃) are constant for the entire sample period.

• We can test this implicit assumption using parameter stability tests. The idea is essentially to split the data into sub-periods and then to estimate up to three models, for each of the sub-parts and for all the data and then to “compare” the RSS of the models.

• There are two types of test we can look at: - Chow test (analysis of variance test)

- Predictive failure tests

(62)

The Chow Test

• The steps involved are:

1. Split the data into two sub-periods. Estimate the regression over the whole period and then for the two sub-periods separately (3 regressions). Obtain the RSS for each regression.

2. The restricted regression is now the regression for the whole period while the “unrestricted regression” comes in two parts: for each of the sub-samples.

We can thus form an F-test which is the difference between the RSS’s. The statistic is RSS



RSS RSS



RSS RSS T k k   1 2   1 2 2

(63)

Structural breaks with dummy variables

• It’s possible to model structural breaks with dummy variables

• Consider the model and a sample with

200 observations

• Assume you suspect a structural break at observation t=120

• You can insert 2 dummies: D

0

for the intercept and D

1

for the

slope

• The model becomes:

• D

0

=0 from t=1 to t=119 and D

0

=1 from t=120 to t=200

• D

1

=0 from t=1 to t=119 and D

1

=1 from t=120 to t=200

0 1 t t t

y









x



u

0 0 1 1 t t t t

y







D





x



D x



u

(64)

Structural breaks with dummy variables

• Run the regression by OLS

• When D

0

= 0, the intercept = 

0

; when D

0

=1, the intercept =



0

+D

0

• When D

1

= 0, the slope = 

1

; when D

1

=1, the slope = 

1

+D

1

• If D

0

and D

1

are not significantly different from zero at %,

the evidence is that there’s no structural break

• This can be generalized to models with more regressors

• Reference: Clements, M.P. and D.F. Hendry (1996). "Intercept corrections and structural change”. Journal of Applied Econometrics, 11, 475–494.

(65)

How do we decide the sub-parts to use?

• As a rule of thumb, we could use all or some of the following:

- Plot the dependent variable over time and split the data accordingly to any obvious structural changes in the series, e.g.

- Split the data according to any known important

historical events (e.g. stock market crash, new government elected)

- Use all but the last few observations and do a predictive failure test on those.

0 200 400 600 800 1000 1200 1400 1 ₂7 ₅3 ₇9 1 0 5 1 3 1 1 5 7 1 8 3 2 0 9 2 3 5 2 6 1 2 8 7 3 1 3 3 3 9 3 6 5 3 9 1 4 1 7 4 4 3 Sample Period V a lu e o f S e ri e s ( yt )