Chapter 5
• Classical linear regression
model assumptions and
diagnostic tests
Violation of the Assumptions of the CLRM
• Recall that we assumed of the CLRM disturbance terms: 1. E(ut) = 0
2. Var(ut) = 2 <
3. Cov (ui,uj) = 0
4. The X matrix is non-stochastic or fixed in repeated samples 5. ut N(0,2)
Investigating Violations of the
Assumptions of the CLRM
• We will now study these assumptions further, and in particular look at: - How we test for violations
- Causes
- Consequences
in general we could encounter any combination of 3 problems: - the coefficient estimates are wrong
- the associated standard errors are wrong - the distribution that we assumed for the test statistics will be inappropriate
- Solutions
- the assumptions are no longer violated - we work around the problem so that we use alternative techniques which are still valid
Statistical Distributions for Diagnostic Tests
• Often, an F- and a 2- version of the test are available.
• The F-test version involves estimating a restricted and an unrestricted version of a test regression and comparing the RSS.
• The 2- version is sometimes called an “LM” test, and only has one degree
of freedom parameter: the number of restrictions being tested, m.
• Asymptotically, the 2 tests are equivalent since the 2 is a special case of
the F-distribution:
F
m T k
T k m m as , 2 Assumption 1: E(u
t) = 0
• Assumption that the mean of the disturbances is zero.
• For all diagnostic tests, we cannot observe the disturbances and so perform the tests of the residuals.
• The mean of the residuals will always be zero provided that there is a constant term in the regression.
Assumption 1: E(u
t) = 0
1 11 12 1 2 1 2 1 2 1Proof in matrix form:
'
'
0 '(
) 0 '
0
1
1
1
0
0
0
0
The first element of the vector ' will be:
...
0
(
T k k kT T T T t tX X
X y
X y X
X u
u
x
x
x
u
x
x
x
u
X u
u
u
u
u
E u
) 0
The Variance of
1 1 1 1 1 1 1 1 1 1 1 ˆ ( ' ) ' ( ' ) '( ) ( ' ) ' ( ' ) ' ( ' ) ' ˆ ˆ ˆ var( ) [( )( ) '] [( ( ' ) ' )( ( ' ) ' ) '] [(( ' ) ' )(( ' ) ' ) '] [( ' ) ' ' ( ' ) ] If assumption 4 hol X X X y X X X X u X X X X X X X u X X X u E E X X X u X X X u E X X X u X X X u E X X X uu X X X 1 1 2 2 1 1 2 1 1 2 2 2 2 1 2 2 1 2 ds (X is non-stochastic), then ˆ var( ) ( ' ) ' ( ') ( ' )If we had a regression with 2 observations, we could write: ( ) ( ) ( ') ( ) ( ) If assumption 1 hol X X X E uu X X X u u u E u E u u E uu E u u u E u u E u 2 1 1 2 2 2 1 2 cov( ) ds ( ( ) 0), then ( ') cov( ) Why? u u E u E uu u u
The Variance of
2 1 1 2 2 2 1 2 2 2 1 1 2 cov( ) Again, if we had T = 2, then: ( ')cov( ) If assumptions 1 and 4 hold, there are 4 possibilities:
1) Assumptions 2 ( ) and 3 ( cov( 0) hold, then cov( ( ') u i j u u E uu u u const u u u u E uu 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 1 2 2 1 ) 0 cov( ) 0
2) Assumption 2 holds but assumption 3 doesn't hold, then
1 cov( ) / cov( ) / ( ') 1 cov( ) cov( ) / / 3) Assumption I u u u u u u E uu u u u u 2 2 1 1 2 1 2 2 2 1 2 2
3 holds but assumption 2 doesn't hold, then
cov( ) 0
( ')
cov( ) 0
4) Assumptions 2 and 3 don't hold, then
u u E uu u u
The Variance of
• Let us analyze the 4 cases:
1. E(uu’) =
2I
the Gauss-Markov theorem proves that this is
the minimum variance
2. not minimum variance
3. not minimum variance
4. not minimum variance
• Conclusion: only when assumptions 1 to 4 hold the
variance-covariance matrix will be minimum (BLUE).
Assumption 2: Var(u
t) =
2<
• We have so far assumed that the variance of the errors is constant, 2 - this
is known as homoscedasticity. If the errors do not have a constant variance, we say that they are heteroscedastic e.g. say we estimate a regression and calculate the residuals, .
u
t uˆ +tt
Detection of Heteroscedasticity
• Graphical methods • Formal tests:
One of the best is White’s general test for heteroscedasticity. The test is carried out as follows:
1. Assume that the regression we carried out is as follows
yt = 1 + 2x2t + 3x3t + ut
And we want to test Var(ut) = 2. We estimate the model, obtaining the
residuals,
2. Then run the auxiliary regression
u
t t t t t t t t tx
x
x
x
x
x
v
u
ˆ
2
1
2 2
3 3
4 22
5 32
6 2 3
Performing White’s Test for Heteroscedasticity
3. Obtain R2 from the auxiliary regression and multiply it by the number of
observations, T. It can be shown that
T R2 2 (m)
where m is the number of regressors in the auxiliary regression excluding the constant term.
4. If the 2 test statistic from step 3 is greater than the corresponding value
from the statistical table then reject the null hypothesis that the disturbances are homoscedastic.
Performing White’s Test for Heteroscedasticity
• Example 4.1, page 135: T = 120
– H
0: Var(u
t) =
2– R
2of auxiliary regression = 0.234
– Test statistic = 0.234 x 120 = 28.8
– Critical value of
25), 5% = 11.07
– Reject H
0•
See Eviews example in page 1371 2 2 3 3
t t t t
White’s test exercise
• Given the model
a regression with 40 observations was performed.
Considering that the auxiliary regression for the
execution of the White’s test got R
2= 0.585, is there
evidence of heteroscedasticity at 5%?
Solution: TR
2= 0.585x40 = 23.4; m = 14; critical
value = 23.685; do not reject H
0, there is no evidence
of heteroscedasticity.
0 1 1 2 2 3 3 4 4
t t t t t t
Consequences of Using OLS in the Presence of Heteroscedasticity
• From Chapter 3 (slide14):
• Derivation of the OLS standard error estimator:
1 1 1 1 1 1 1 1 1 1 1 ˆ ( ' ) ' ( ' ) '( ) ( ' ) ' ( ' ) ' ( ' ) ' ˆ ˆ ˆ var( ) [( )( ) '] [( ( ' ) ' )( ( ' ) ' ) '] [(( ' ) ' )(( ' ) ' ) '] [( ' ) ' ' ( ' ) ] Since by assumption X X X y X X X X u X X X X X X X u X X X u E E X X X u X X X u E X X X u X X X u E X X X uu X X X 1 1 2 1 2 1 2 1 X is non-stochastic, then ˆ var( ) ( ' ) ' ( ') ( ' ) ˆ ˆ ˆ ˆ But var( ) ( ') ( ) ( ' ) ' ( ' ) ˆ ( ) ( ' ) X X X E uu X X X u E uu s I Var X X X s IX X X Var s X X
Consequences of Using OLS in the Presence of Heteroscedasticity 2 2 2 2 2 2 2 2 0 0 0 0 0 0 ( ) ( ') 0 0 0 0 0 0 and 0 0 0 0 0 0 ˆ ˆ ˆ ( ) ( ') 0 0 0 0 0 0 Var u E uu I s s Var u E uu s I s
Consequences of Using OLS in the Presence of Heteroscedasticity
•
. 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 2 1 2 2However, if heteroscedasticity is present:
0
0
0
0
0
0
( )
(
')
, where
0
0
0
0
0
0
and
0
0
0
0
0
0
ˆ
ˆ
ˆ ˆ
( )
(
')
, where
0
0
0
0
0
0
T T T TVar u
E uu
s
s
Var u
E uu
s
s
s
s
Consequences of Using OLS in the Presence of Heteroscedasticity
• Hence, if heteroscedasticity exists
1
11 2
( )
ˆ
(
)
Consequences of Using OLS in the Presence of Heteroscedasticity
• The Gauss-Markov theorem shows that the OLS estimator is
efficient, which means that under homoscedasticity (and no
serial correlation) var() is the minimum variance (See Heij et
al, page 98).
• So, under heteroscedasticity var() is not the minimum
variance, i.e. var() is biased upwards (and so is).
• Since SE() is in the denominator of the t-statistic , this will
be biased downwards.
• Therefore, it will become more difficult to reject H
0, i.e. the
probability of committing a type II error (not rejecting a false
H
0) will increase.
OLS standard error estimator with
heteroscedasticity
• If there is heteroscedasticity, E(u
12) E(u
22
)
.... E(u
T2), which means that s
12
s
22 s
T2Other Approaches to Dealing
with Heteroscedasticity
• White (1980) has shown that
is a consistent estimator of
• White, H. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica, vol. 48, no. 4, 1980, pp. 817–838. http://staff.utia.cas.cz/barunik/files/AE/Seminar8/White1980.pdf 2 1 2 2 2 2 ˆ 0 0 0 ˆ 0 0 0 ˆ
where , 1, 2,..., , are the OLS residuals;
0 0 0 ˆ 0 0 0 t T u u D u t T u 2 2 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 T s s s
Other Approaches to Dealing
with Heteroscedasticity
• Hence,
is a consistent estimator of
• This is called the White’s Heteroscedasticity
Consistent Estimator (HEC) estimator
• The SE’s obtained are called robust standard
errors
•
1 1ˆ
var
( )
(
X X
’
)
X DX X X
’
’
The HEC var-cov matrix with DF adjustment
• In small samples, a degrees of freedom adjustment
is necessary, as follows:
2 1 2 2 2ˆ
0
0
0
ˆ
0
0
0
0
0
0
ˆ
0
0
0
Tu
u
T
D
T k
u
HEC Matrix example
Dependent Variable: RIBOV Method: Least Squares Date: 03/27/18 Time: 17:52
Sample (adjusted): 2000M03 2016M12 Included observations: 202 after adjustments
White heteroskedasticity-consistent standard errors & covariance
Variable Coefficient Std. Error t-Statistic Prob. C 0.007293 0.004396 1.659109 0.0987 VP_PU -0.000123 5.59E-05 -2.206356 0.0285 D(EMBI) -0.000321 5.42E-05 -5.917603 0.0000 R-squared 0.305333 Mean dependent var 0.006084 Adjusted R-squared 0.298351 S.D. dependent var 0.072922 S.E. of regression 0.061083 Akaike info criterion -2.738421 Sum squared resid 0.742499 Schwarz criterion -2.689289 Log likelihood 279.5806 Hannan-Quinn criter. -2.718542 F-statistic 43.73402 Durbin-Watson stat 1.871069 Prob(F-statistic) 0.000000 Wald F-statistic 24.33457 Prob(Wald F-statistic) 0.000000
Dependent Variable: RIBOV Method: Least Squares Date: 04/04/18 Time: 19:17
Sample (adjusted): 2000M03 2016M12 Included observations: 202 after adjustments
Variable Coefficient Std. Error t-Statistic Prob.
C 0.007293 0.004418 1.650704 0.1004
VP_PU -0.000123 7.07E-05 -1.744174 0.0827
D(EMBI) -0.000321 3.50E-05 -9.147480 0.0000
R-squared 0.305333 Mean dependent var 0.006084
Adjusted R-squared 0.298351 S.D. dependent var 0.072922
S.E. of regression 0.061083 Akaike info criterion -2.738421
Sum squared resid 0.742499 Schwarz criterion -2.689289
Log likelihood 279.5806 Hannan-Quinn criter. -2.718542
F-statistic 43.73402 Durbin-Watson stat 1.871069
Background –
The Concept of a Lagged Value
t yt yt-1 yt 1989M09 0.8 -1989M10 1.3 0.8 1.3-0.8=0.5 1989M11 -0.9 1.3 -0.9-1.3=-2.2 1989M12 0.2 -0.9 0.2-0.9=1.1 1990M01 -1.7 0.2 -1.7-0.2=-1.9 1990M02 2.3 -1.7 2.3-1.7=4.0 1990M03 0.1 2.3 0.1-2.3=-2.2 1990M04 0.0 0.1 0.0-0.1=-0.1
. . . .
. . . .
. . . .
Autocorrelation
• We assumed of the CLRM’s errors that Cov(ui , uj) = 0 for i j, i.e.
This is the same as saying there is no pattern in the errors.
• Obviously we never have the actual u’s, so we use their sample counterpart, the residuals (the ’s).
• If there are patterns in the residuals from a model, we say that they are autocorrelated.
• Some stereotypical patterns we may find in the residuals are given on the next 3 slides.
Positive Autocorrelation
Positive Autocorrelation is indicated by a cyclical residual plot over time.
+ -t uˆ + 1 ˆt u -3.7 -6 -6.5 -6 -3.1 -5 -3 0.5 -1 1 4 3 5 7 8 7 + -Time t uˆ
Negative Autocorrelation
Negative autocorrelation is indicated by an alternating pattern where the residuals
+ -t uˆ + 1 ˆt u + -t uˆ Time
No pattern in residuals –
No autocorrelation
No pattern in residuals at all: this is what we would like to see
+ t uˆ -+ 1 ˆt u + -t uˆ Time
Detecting Autocorrelation:
The Durbin-Watson Test
The Durbin-Watson (DW) is a test for first order autocorrelation - i.e. it assumes that the relationship is between an error and the previous one
ut = ut-1 + vt (1) where vt N(0, v2).
• The DW test statistic actually tests
H0 : =0 and H1 : 0 • The test statistic is calculated by
DW u u u t t t T t t T
1 2 2 2 1Detecting Autocorrelation:
The Durbin-Watson Test
2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 1 1 2 1 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2
(
)
2
...
;
...
If T is large,
(
)
2
2
2
T T T T t t t t t t t t t t T T t T t T t t T T T T T T T t t t t t t t t t t t t t t t t tu
u
u
u
u u
u
u
u
u
u
u
u
u
u
u
u
u
u
u u
u
u u
2 1 1 2 2 2 2 2 1 12
2 1
T T T t t t t t t t t T T t t t tu
u u
u u
DW
u
u
Detecting Autocorrelation:
The Durbin-Watson Test
1 1 1 1 1 1 2 2 1 1cov( ,
)
(
( ))(
(
))
( )(
)
1
ˆ ˆ
Sample estimate of cov( ,
)
1
1
ˆ
Sample estimate of var( )
1
ˆ ˆ
(
1)cov( ,
)
ˆ
2 1
2(1
)
ˆ
(
1) var( )
t t t t t t t t T t t t t t T t t t t t tu u
E u
E u
u
E u
E u u
u u
u u
T
u
u
T
T
u u
DW
T
u
The Durbin-Watson Test:
Critical Values
• We can also write
(2)
where is the estimated correlation coefficient. Since is a correlation, it implies that .
• Rearranging for DW from (2) would give 0 DW 4.
• If = 0, DW = 2. So, do not reject the null hypothesis if DW 2 because there is little evidence of autocorrelation
• DW has 2 critical values, an upper critical value (du) and a lower critical value (dL), and there is also an intermediate region where we can neither reject nor not reject H0.
•
DW 2 1( )
The Durbin-Watson Test: Interpreting the Results
Conditions which Must be Fulfilled for DW to be a Valid Test 1. Constant term in regression
Durbin-Watson test exercise
• An OLS regression produced
regression residuals given by:
• Calculate the DW statistic and perform the
Durbin-Watson test. For a significance level of 5%, T = 5 and 2
parameters excluding the constant term use d
L= 0.467
and d
U= 1.896. What is the test result at the 5% level?
• Solution: DW=3.991; d
L=0.467; d
U=1.896; 4–d
L=3.531;
4–d
U=2.103.
• There is evidence of negative autocorrelation.
1 2 2 3 3
t t t
y
x
x
u
( 2.2 0.5 0.5 1.5 0.7)'
Another Test for Autocorrelation: The Breusch-Godfrey Test*
•It is a more general test for rth order autocorrelation, where:
, vt N(0, 2)
•The null and alternative hypotheses are: H0 :
1 = 0 and
2 = 0 and ... and
r = 0H1 :
1 0 or
2 0 or ... or
r 0•The test is carried out as follows:
1. Estimate the linear regression using OLS and obtain the residuals.
2. Regress the residuals on all of the regressors from stage 1 (the x’s) plus lagged residuals up to tr, i.e. and obtain R2 from this regression. The
number of lags of ut can be decided using information criteria.
3. It can be shown that (T-r)R2
2(r).Breusch-Godfrey exercise
• Let a regression be
with T = 40, and we suspect that the residuals
have a serial correlation of the 3
rdorder.
• Carry out a B-G test at 5%.
1 2 2 3 3
(1)
t t t
Breusch-Godfrey exercise
• Solution:
• Run the regression (1) and get the residuals.
• Specify the auxiliary regression as
• Run regression (2). Assume that its R
2= 0.25.
• The test statistic = (40 – 3)*0.25 = 9.25
• Critical value (5%, r = 3) = 7.815
1 2 2 3 3 1 1
+
2 2+
3 3(2)
t t t t t t
Breusch-Godfrey exercise
• Assume that when we were looking up at the
Chi
2table we chose 0.005 instead of 0.05 and
then we performed the test.
• What happens then?
Breusch-Godfrey exercise
• If we are not confident about the SC order, we
can run (2) with different lags and choose the
one with the lower AIC or SBIC and the
Consequences of Ignoring Autocorrelation if it is Present
• The coefficient estimates derived using OLS are still unbiased, but they are inefficient, i.e. they are not BLUE, even in large sample sizes.
• Thus, if the standard error estimates are inappropriate, there exists the possibility that we could make the wrong inferences (Type II error = false negative)
• R2 is likely to be inflated relative to its “correct” value for positively
Consequences of Ignoring Autocorrelation
if it is Present
• When autocorrelation exists, the matrix is
• So that:
1 1 1 1 21
1
0
0
1
T T T
“Remedies” for Autocorrelation
• If the form of the autocorrelation is known, we could use a GLS procedure – i.e. an approach that allows for autocorrelated residuals e.g., Cochrane-Orcutt. See Brooks Ch. 5.
The Cochrane-Orcutt approach
• Let a regression be:
• Applying one lag:
• Multiplying by :
The Cochrane-Orcutt approach
• Rearranging terms and noticing that :
• Making
• We get:
• This is a GLS equation and it can be estimated by OLS
since v
tis free from SC.
• But we need an estimate of
, i.e. be obtained from
The Cochrane-Orcutt approach: summary
• See: Cochrane, D. and Orcutt, G. H. (1949) Application of Least Squares
Solution to serial correlated (and heteroscedastic)
residuals in regressions
• As we have seen previously, untreated residual serial
correlation provokes biased coefficient standard
errors, which can lead to inference errors (Type I or
II) and to the reduction of the power of the t-test.
• A solution was proposed by Newey and West (1987):
“A Simple, Positive Semi-Definite, Heteroskedasticity
and Autocorrelation Consistent Covariance Matrix”,
Econometrica, 55/3, available in:
https://docs.google.com/viewer?a=v&pid=site
s&srcid=ZGVmYXVsdGRvbWFpbnxvdGF2aW9tZWRlaXJ
vczIwMDl8Z3g6MmY0ZDliMDdkNWYwYjlmNQ
Multicollinearity
• This problem occurs when the explanatory variables are very highly correlated with each other.
• Perfect multicollinearity
Cannot estimate all the coefficients - e.g. suppose x3 = 2x2
and the model is yt = 1 + 2x2t + 3x3t + 4x4t + ut
• Problems if Near Multicollinearity is Present but Ignored
- R2 will be high but the individual coefficients will have high standard errors.
- The regression becomes very sensitive to small changes in the specification. - Thus confidence intervals for the parameters will be very wide, and
Measuring Multicollinearity
• The easiest way to measure the extent of multicollinearity is simply to look at the matrix of correlations between the individual variables. e.g.
• But another problem: if 3 or more variables are linear - e.g. x2t + x3t = x4t
• Note that high correlation between y and one of the x’s is not multicollinearity.
Corr
x
2x
3x
4x
2-
0.2
0.8
x
30.2
-
0.3
-Measuring Multicollinearity
• Another criterion to check for multicollinearity:
– VIF – Variance inflation factor
– Compute all VIF’s.
– If one or more VIF > 10, there are multicollinearity
problems
2 2 2 1 ( ) 1where R R of the regression of independent variable X on all other regressors.
j j j j VIF X R
Measuring Multicollinearity
• How do we interpret the variance inflation factors for a
regression model?
• It is a measure of how much the variance of the estimated
regression coefficient b
kis "inflated" by the existence of
correlation among the predictor variables in the model.
• A VIF of 1 means that there is no correlation among
the k
thpredictor and the remaining predictor variables, and
hence the variance of b
kis not inflated at all.
• The general rule of thumb is that VIFs exceeding 4 warrant
further investigation, while VIFs exceeding 10 are signs of
serious multicollinearity requiring correction.
Micronumerosity
• Micronumerosity: A term introduced by Goldberger
(1964) to describe properties of econometric estimators
with small sample sizes.
• Exact micronumerosity: fewer observations than
parameters to be estimated: estimation is not possible.
• Near micronumerosity: the number of observations
barely exceeds the number of parameters to be
Micronumerosity
• What size should the sample have
– Minimum T-k 30
– Ideal T-k 100
• What can be done
– Increase sample size;
– Re-analyse theory and model and reduce number
of coefficients if possible.
Testing the Normality Assumption
• Why did we need to assume normality for hypothesis testing? Testing for Departures from Normality
• The Bera Jarque normality test
• A normal distribution is not skewed and is defined to have a coefficient of kurtosis of 3.
• The kurtosis of the normal distribution is 3 so its excess kurtosis (b2-3) is zero.
• Skewness and kurtosis are the (standardised) third and fourth moments of a distribution.
Normal versus Skewed Distributions
A normal distribution A skewed distribution
f(x)
x x
Leptokurtic versus Normal Distribution
0.1 0.2 0.3 0.4 0.5Testing for Normality
• Bera and Jarque formalise this by testing the residuals for normality by testing whether the coefficient of skewness and the coefficient of excess kurtosis are jointly zero.
• It can be proved that the coefficients of skewness and kurtosis can be expressed as:
and
• The Bera Jarque test statistic is given by
• We estimate b1 and b2 using the residuals from the OLS regression, .
• See page 181. u
b1 E u3 2 3 2 [ ]/ b
E u 2 4 2 2 [ ]
~
2 24 3 6 2 2 2 2 1 T b b WWhat do we do if we find evidence of Non-Normality?
• It is not obvious what we should do!
• Could use a method which does not assume normality, but difficult and what are its properties?
• Often the case that one or two very extreme residuals causes us to reject the normality assumption.
• An alternative is to use dummy variables.
e.g. say we estimate a monthly model of asset returns from 1980-1990, and we plot the residuals, and find a particularly large outlier for October
What do we do if we find evidence
of Non-Normality? (cont’d)
• Create a new variable:
D87M10t = 1 during October 1987 and zero otherwise.
This effectively knocks out that observation. But we need a theoretical reason for adding dummy variables. See page 186.
Oct 1987 + -Time t uˆ
Normality test exercise
• A linear regression
produced residuals given by .
The variance of the residuals is equal to 8/3. Compute
the Jarque-Bera statistic (W) and perform the
normality test.
• Solution:
E(u
3) = 1.2; E(u
4) = 4; b1 = 0.275568; b2 = 0.5625; W
= 1.301074; Crit. Value (5%) = 5.991 Do not reject
H
There is evidence that the residuals are normal.
1 2 2 3 3
t t t
y
x
x
u
' (2 1 1 1 1)
Parameter Stability Tests
• So far, we have estimated regressions such as
• We have implicitly assumed that the parameters (1, 2 and 3) are constant for the entire sample period.
• We can test this implicit assumption using parameter stability tests. The idea is essentially to split the data into sub-periods and then to estimate up to three models, for each of the sub-parts and for all the data and then to “compare” the RSS of the models.
• There are two types of test we can look at: - Chow test (analysis of variance test)
- Predictive failure tests
The Chow Test
• The steps involved are:
1. Split the data into two sub-periods. Estimate the regression over the whole period and then for the two sub-periods separately (3 regressions). Obtain the RSS for each regression.
2. The restricted regression is now the regression for the whole period while the “unrestricted regression” comes in two parts: for each of the sub-samples.
We can thus form an F-test which is the difference between the RSS’s. The statistic is RSS
RSS RSS
RSS RSS T k k 1 2 1 2 2Structural breaks with dummy variables
• It’s possible to model structural breaks with dummy variables
• Consider the model and a sample with
200 observations
• Assume you suspect a structural break at observation t=120
• You can insert 2 dummies: D
0for the intercept and D
1for the
slope
• The model becomes:
• D
0=0 from t=1 to t=119 and D
0=1 from t=120 to t=200
• D
1=0 from t=1 to t=119 and D
1=1 from t=120 to t=200
0 1 t t t
y
x
u
0 0 1 1 t t t ty
D
x
D x
u
Structural breaks with dummy variables
• Run the regression by OLS
• When D
0= 0, the intercept =
0; when D
0=1, the intercept =
0+D
0• When D
1= 0, the slope =
1; when D
1=1, the slope =
1+D
1• If D
0and D
1are not significantly different from zero at %,
the evidence is that there’s no structural break
• This can be generalized to models with more regressors
• Reference: Clements, M.P. and D.F. Hendry (1996). "Intercept corrections and structural change”. Journal of Applied Econometrics, 11, 475–494.
How do we decide the sub-parts to use?
• As a rule of thumb, we could use all or some of the following:
- Plot the dependent variable over time and split the data accordingly to any obvious structural changes in the series, e.g.
- Split the data according to any known important
historical events (e.g. stock market crash, new government elected)
- Use all but the last few observations and do a predictive failure test on those.
0 200 400 600 800 1000 1200 1400 1 27 53 79 1 0 5 1 3 1 1 5 7 1 8 3 2 0 9 2 3 5 2 6 1 2 8 7 3 1 3 3 3 9 3 6 5 3 9 1 4 1 7 4 4 3 Sample Period V a lu e o f S e ri e s ( yt )