Forecasting with Autoregressive Models in the Presence of Data Revisions

(1)

Forecasting with Autoregressive Models in the Presence of Data Revisions

Michael P. Clements Department of Economics

Ana Beatriz Galvão Department of Economics Queen Mary, University of London

[email protected] June 10, 2008

Abstract

We investigate the e¤ects of data revisions on forecasting using autoregressive models, when the data set consists of end-of-sample data, and when the data set is constructed in such a way that it comprises the …rst-release value at each point in time. We derive analytical expressions for the e¤ects of noise, news and non-zero mean revisions on the estimates of the AR models’

parameters, and how these depend on the way the data set is constructed. Our calculations indicate that di¤erences in the construction of real-time data sets have only small e¤ects on estimating and evaluating forecasts from AR models when we consider the statistical process of data revisions on output growth and in‡ation in the US. An empirical exercise con…rms the implications of our analytical results.

1 Introduction

There has been much interest in the recent literature on the e¤ects of di¤erent data vintages on model speci…cation and forecast evaluation, and in the use of ‘real-time data’ in assessing predictability, as opposed to using ‘…nal-revised’ data, based on concerns that the use of …nal- revised data may exaggerate the predictive power of explanatory variables relative to what could actually have been achieved at the time using the then available data (see, e.g., Robertson and Tallman (1998), Orphanides (2001), Croushore and Stark (2001, 2003), Faust, Rogers and Wright (2003) and Orphanides and van Norden (2005)). However, Koenig, Dolmas and Piger (2003) go further and suggest that the way in which real-time data is conventionally used in real-time forecasting exercises is suboptimal, as it invariably uses end-of-sample vintage data, rather than their preferred strategy ‘at every date within a sample, right-side variables ought to be the most up-to-date estimates availableat that time’(Koenig et al, p.618, described as Strategy 1 on p.619).

(2)

Recently, Clements and Galvão (2007) have provided an empirical investigation of these issues in the context of short-term forecasting of US output growth using (coincident) indicators, while Clark and McCracken (2007) consider the impact of data revisions on tests of equal predictive accuracy.

Our interest is in the e¤ects of data revisions on forecasting using autoregressive models, given the widespread use of autoregressive models in forecasting. We undertake an analysis of forecasting with autoregressions when the revisions process is either ‘news’ or ‘noise’, in the Mankiw and Shapiro (1986) sense, or some combination of the two, and we extend the analytical results with a Monte Carlo study and an empirical illustration based on forecasting US output growth and in‡ation.

As shown by Jacobs and van Norden (2006), amongst others, the problem of forecasting when there are multiple vintage estimates of the underlying ‘true process’ can be cast as a state space (SS) model, and the optimal predictor can be derived. We regard this is an attractive way of formalising the statistical data generating process, but our interest is in the properties of simple AR models when data are subject to revision, and speci…cally whether it is likely to matter whether we use ‘end-of-sample’ vintage, or what we will call ‘real-time’ vintage. The SS model allows us to derive the properties of the AR models when the data generating process contains a fully- speci…ed model of the revisions process, so that we are able to gauge the impact of features of interest more easily, for example, the impact on forecast performance of the nature of revisions (news, noise, or some combination of the two), the relative magnitude of …rst revisions versus later revisions, and the number of times the data are revised. We leave for future research the question of whether an empirically estimated SS model can be used to produce more accurate forecasts.

There are alternative approaches to using data subject to revision, such as that of Kundan Kishor and Koenig (2005), which builds upon earlier contributions by Howrey (1978, 1984) and Sargent (1989). In essence, the use of end-of-sample vintage data means that the forecasting model will be estimated on heavily revised data (for the most part), while the data on which the forecast is conditioned is often …rst-release (unrevised). The Kalman …lter is used to obtain estimates of the

‘current truth’from the …rst-release data, which is then used in the forecasting model to generate forecasts of (heavily-revised) data.

Our contribution is to show that the use of end-of-sample versus real-time vintage data can in principle matter greatly for forecasting using autoregressions. However, for the types of data revisions experienced by output growth and in‡ation in the US in the post WWII period, our analytical calculations indicate that any di¤erences will be small, and this is borne out empirically.

The plan of the rest of the article is as follows. The next section describes the statistical framework for generating data revisions, following Jacobs and van Norden (2006). Section 3 derives the population values of the AR model that minimize mean squared forecast error (MSFE) for the data generation process outlined in section 2, and compares the population values of the AR model’s parameters with the optimal values when the model is estimated using real-time-vintage data and end-of-sample vintage data. Under pure news, the estimator of the AR parameter is consistent

(3)

for the optimal parameter vector independently of the way in which real-time data is employed in estimation. When there is noise, estimation with real-time-vintage data remains consistent, provided the forecast is also conditioned on the same type of real-time data set, but estimation with end-of-sample vintage data is now inconsistent howsoever the forecasts are constructed. These results are extended in section 4 to allow for the data revisions to have non-zero means. Section 5 illustrates with AR forecasting models for post WWII US data on output growth and in‡ation, backed up with some Monte Carlo simulations.

2 Statistical framework

The basic statistical framework we use to analyse the e¤ect of data revisions on the forecast performance of autoregressive models is the following. We assume that the periodt+svintage estimate of the value of y in periodt, denoted y_t^t+s, wheres= 1; : : : ; l, consists of the true value ey_t, as well as (in the general case) news and noise components, ^t+s_t and "^t+s_t , thus:

y^t+s_t =yet+ ^t+s_t +"^t+s_t :

Data revisions are news when initially released data are optimal forecasts of later data, so news revisions are not correlated with the new released data, that is,

Cov ^t+s_t ; y^t+s_t = 0:

Data revisions are noise when they reduce the noise of the original release, so each new release of the data is equal to the true value ey_tplus noise, so that noise revisions are not correlated with the truth:

Cov "^t+s_t ;ye_t = 0:

We assume that the underlying statistical process for ey_t is a combination of an AR(p) process with iid disturbances _1t, and a sum of iid disturbances _2t for each vintage. The process is:

e yt=

Xp i=1

iyet i+R_{1 1t}+ Xl i=1

vi 2t;i; (1)

where _t = [ _1t; ⁰_2t; ⁰_3t] is iid, E( _t) = 0, with E( _t ⁰_t) = I. Thus v1; :::; vl are standard deviations of _2t;1; :::; _2t;l processes, andR₁= ₁ is the standard deviation of the disturbances of the underlying AR(p) process for the true values.

If we letyt= y_t^t+1; : : : ; y^t+l_t ⁰, and similarly the vector of noise revisions"t= "^t+1_t ; : : : ; "^t+l_t ⁰

(4)

and the one of news revisions _t= ^t+1_t ; : : : ; ^t+l_t ⁰;then we specify _t and "_tas:

yt

2 66 66 66 64

y_t^t+1 y_t^t+2

y_t^t+l 3 77 77 77 75

=yet

f tg

2 ! 66 66 66 66 4

v1 v2 : : : _v_l

v2

. ..

0 v_l

3 77 77 77 77 5

2 66 66 66 64

2t;1 2t;2

2t;l

3 77 77 77 75

+

f"tg

2 66 66 66 64

"1 3t;1

"2 3t;2

"l 3t;l

3 77 77 77 75

; (2)

where assumptions on _t have been presented before and _"₁; :::; _"_l are standard deviations of

3t;1; :::; _3t;l processes. Therefore, the …rst estimate of yt, y^t+1_t , estimates yet with noise ( "1 3t;1) and a news term consisting of l separate components ( Pl

i=1 vi 2t;i). Later estimates are also characterised by noise, but fewer news components and therefore provide more accurate estimates of ey_t. If _v_l = 0 and _"_l = 0 thel-vintage value is the true value,y_t^t+l =ye_t.

Combining equation (2) with the statistical process foreyt, equation (1), the speci…cation implies that the set of revisions,y_t iye_t= _t+"_t(ian l 1 vector of ones) is uncorrelated withye_t when there is no news ( _t= 0), i.e.,E("_tye_t) = 0by the assumption thatE( _t ⁰_t) =I. As a consequence, when revisions arepure noise:

yt ieyt="t, so thatE("teyt) = 0:

Combining equations (2) and (1) also implies that the revisions are uncorrelated with y_t when there is no noise ("t= 0), because then: y_t^t+s=yet+ ^t+s_t =Pp

i=1 ieyt i+R_{1 1t}+Pl

i=1 vi 2t;i

Pl

i=s vi 2t;i =Pp

i=1 iye_{t i}+R_{1 1t}+Ps 1

i=1 vi 2t;i. HenceE ^t+s_t y_t^t+s = 0since ^s_t = Pl

i=s vi 2t;i, fors= 1; : : : ; l. As a consequence, when revisions are pure news:

y_t^t+s= Xp i=1

iye_{t i}+R_{1 1t}+

s 1

X

i=1

vi 2t;i, so that E ^t+s_t y_t^t+s = 0:

The model can be cast in state-space form (SSF) (c.f., Jacobs and van Norden (2006)) where the transition equations are described by:

t+1 =T _t+R _t+1

t+1 =

2 66 4

e yt

t+1

"t+1

3 77 5; T=

2 66 66 64

0 0 0 0

I_p 0 0 0 0 0 T₃ 0 0 0 0 T4

3 77 77 75

; R=

2 66 66 64

R1 R3 0

0 0 0

0 U₁:diag(R₃) 0

0 0 R4

3 77 77 75

; _t=

2 66 4

1t 2t 3t

3 77 5

whereyet= (eyt; : : : ;yet p) and ⁰= ₁; : : : ; _p . Here,R₃ = [ v1: : : _vl],U₁ is an upper-triangular matrix of ones, and R₄ = diag( _"₁; : : : ; _"_l).The measurement equation is y_t = ye_t+I_l _t+I_l"_t.

(5)

This SSF gives the setup we have described when T3 = 0 and T4 = 0, but otherwise allows for what Jacobs and van Norden (2006) refer to as ‘spillover e¤ects’. A spillover occurs when the noise or news that a¤ects thet+j estimate of ytwill also a¤ect thet+j estimates ofyt+1,yt+2; : : : etc.

The state-space model, with both news and noise components, can be estimated using maximum likelihood and the Kalman Filter. However, the identi…cation of news components requires that

v_l = 0, so l 1 news components are identi…ed. The consequence of this assumption is that the

…nal data y^t+l_t is equal to the truthye_t plus noise "^t+l_t .

3 Forecasting with AR models

In this section, we consider two ways of using real-time data when forecasting with autoregressive models. When forecasters use real-time data to estimate and to evaluate forecasts from autoregressive models, they normally use at each forecasting origin the latest vintage of data available at that time. We call this use of real-time data ‘end-of-sample’vintage data (EOS). Another way of using real-time data is to use always …rst-released data. This use of vintage data is called ‘real-time vintage’data (RTV).

3.1 Optimal forecast function for an AR(p)

The optimal forecasts of yt can be obtained using the estimated version of the statistical model.

In practice, practitioners often use simple AR models. We consider an AR(p) model, where the forecast function is given by either ⁰yT, with ⁰ = ₁; : : : ; _p andy⁰_T = y^T_T⁺¹; : : : ; y^T_T ^p+2_p+1 , so that the forecasts are conditioned on RTV data, or by ⁰y_T^T⁺¹, wherey^T_T⁺¹⁰ = y_T^T⁺¹; : : : ; y_T^T⁺¹_p+1 , so that the forecasts are conditioned on end-of-sample EOS data - the vintage available at period T+ 1. Note that we assume that the …rst-release of they_T is in the next period,T+ 1. We use to denote the parameter vector in both cases, but note that in general the optimal values (denoted by ) will not be the same. However, when p = 1, the two conditioning sets coincide, that is, yT =

n y^T_T⁺¹

o

=y^T_T⁺¹ = n

y_T^T⁺¹ o

.

The optimal (population) value of the AR forecasting model parameter will be denoted by , and under the assumption of squared-error loss is de…ned by:

= arg minE y^T_T^+f₊₁ ⁰y_T ² (3) whereyT is eitheryT ory^T_T⁺¹. In the Appendix A we prove the following proposition.

Proposition 1. The optimal value of the parameter in the autoregressive model when the data vintages are characterised by news and noise, as speci…ed in the statistical model given by (2) and (1), is given by:

(6)

= _e_y _v+ _" ¹ _y_e _v :

where the ’s are second moment matrices of the measurement errors (detailed in the Appendix) that depend upon yT,that is, whether the forecasts are conditioned on RTV or EOS data.

For the pure news case ( _"=0):

news=

irrespective of yT. For the pure noise case ( v=0):

noise = _y_e+ "

1 e y

where _"= ²_"₁I_p for RTV data and _"=R₄ =diag( _"₁; : : : ; _"_l) for EOS data. Hence, for pure news the optimal value of is the underlying parameter vector, . For noise the optimal value of depends on whether EOS or RTV data are used in the forecast function. These results hold for f = 2;3: : : ; i.e., irrespective of whether the goal is to forecast the …rst-released value or the latest available estimate.

For the special case of an AR(1) forecasting model:

=

2 e y

2v 2

e

y 2

v+ ²_"₁ where ²_v =Pl

i=1 2

vi, so for pure news case ( ²_"₁ = 0):

news=

and for pure noise ( ²_v = 0):

noise =

2 e y 2e

y+ ²_"₁ =

2"1

2e

y+ ²_"₁: (4)

For the general AR(p) forecasting model the expected squared error E y_T^T^+f₊₁ ⁰y_T ² is smaller for y_T =y_T^T⁺¹ than for y_T = y_T when revisions are pure news, but the ranking depends on the covariance structure of the measurement errors when revisions are noise.

The next sections calculate the estimators of when the AR forecasting models are estimated using EOS and RTV data.

(7)

3.2 Estimating AR forecasting models using EOS data

The use of EOS data corresponds to estimating the AR forecasting model:

Y^T⁺¹ =Y ₁ +error

where Y ₁ =h

Y^T₁⁺¹; : : : ; Y^T_p⁺¹i

, and the vectors of observations on the LHS and RHS are given by:

Y^T⁺¹ = 2 66 66 66 66 66 66 64

y_p+1^p+1+l ... y_T^T _l y_T^T⁺¹_l+1

... y^T_T⁺¹₁ y^T_T⁺¹

3 77 77 77 77 77 77 75

; Y^T_i⁺¹ = 2 66 66 66 66 66 66 64

y^p+1+l_p+1 _i ... y^T_T _{l i} y^T_T⁺¹_l+1 _i

... y^T_T⁺¹_i ₁

y_T^T⁺¹_i 3 77 77 77 77 77 77 75

for i = 1; : : : ; p. We assume that there are up to l vintages of each y, and thereafter there are no further changes. The more recent y’s will therefore have been revised fewer thanl times. The population value of in the forecasting model using EOS is given by:

plim^ = plim 1

TY⁰ ₁Y ₁

1

plim 1

TY⁰ ₁Y^T⁺¹: In the appendix B, we derive the results summarised in the proposition 2.

Proposition 2. The population value of the least-squares estimator of the parameter vector in the autoregressive model using end-of-sample vintage data is given by:

plim^ = _y_e v+ " 1

e

y v

where _v and _" are second moment matrices of the news and noise components de…ned in the appendix.

It follows immediately that plim^ = = under pure news ( _" = 0), but under noise plim^ 6= . If we consider the special case of an AR(1) under noise:

plim ^ =

2 e y 2e

y+ ²_" ₁ (5)

= ²_e_y+ ²_" ₁ ¹ ²_" ₁

where:

2" 1 = 1 T

" _l X

i=2

2"i+ (T l+ 1) ²_"

l

# :

(8)

An immediate implication is that plim^ > if earlier revisions are larger than later revisions (compare (5) to (4)). Note that if ²_"

l = 0, so that the truth is eventually revealed when there is noise, then ²_" ₁ isO T ¹ and plim ^! for a large estimation sample.

3.3 Estimating AR forecasting models using RTV data

When RTV data is used, we denote the vectors of observations on the LHS and RHS by Y^t and Y^t₁¹; : : : ; Y^{t p}_p , respectively, where e.g.,:

Y^t= 2 66 66 64

... y^T_T ₁ y_T^T⁺¹

3 77 77 75

; Y^{t i}_i = 2 66 66 64

... y_T^T _iⁱ ₁ y_T^T _iⁱ⁺¹

3 77 77 75

for i = 1; : : : ; p. Letting Y ₁ = h

Y^t₁¹; : : : ; Y^{t p}_p i

, the population value of in the forecasting model using RTV:

Y^t=Y ₁ +error is given by:

plim^ = plim 1

TY⁰ ₁Y 1 1

plim 1

TY⁰ ₁Y^t: In the appendix C, we derive the following result.

Proposition 3. The population value of the least-squares estimator of the parameter vector in the autoregressive model using real-time vintage data is given by:

plim^ = _e_y v+ "

1 e

y v .

The estimator is consistent for news, and is consistent for noise when the forecast is conditioned on RTV, but not when the forecast is conditioned on EOS.

Our results indicate that, when data revisions are pure news, the estimator of the AR parameter is consistent for whether EOS or RTV data is employed in estimation. When there is noise, estimation with RTV data remains consistent, provided the forecasts are also conditioned on RTV data, but estimation with EOS data is now inconsistent whether the forecast is conditioned on EOS or RTV data. Consequently, we would expect the use of RTV data to yield dividends when there is noise, but not in the case of pure news.

(9)

4 Non-zero mean revisions

An extension to the analysis in section 3 is to allow for non-zero mean measurement errors:

E( _2t) =

2 6= 0 and E( _3t) =

3 6= 0. We maintain the assumption that the innovation to ye_t is zero mean. With zero-mean measurement errors, and ye_t zero mean, as hitherto, it was without loss of generality to consider AR forecasting models without intercepts. As both eytand yt

are now allowed to have non-zero means, an intercept is included in the forecasting model, which becomes + y_T.¹ We maintain the implicit assumption that the AR order of the forecasting model is the same as the order of the process forye_t.

The optimal values of( ; )in terms of minimizing squared-error loss are given by:

( ; ) = arg min

;

E y^T_T₊₁^+f y_T ²

where:

E y_T^T₊₁^+f y_T ² =E ⁰ey_T +R_{1 1T}₊₁+ ^v_T^T^+f₊₁ +"^T_T^+f₊₁ ⁰ey_T ⁰v_T ⁰"_T ²

withyT,ye⁰_T,v⁰_T,"⁰_T and ^v^T_T^+f₊₁ de…ned as in the proof of Proposition 1 in appendix A.

In the appendix D, we prove the following proposition.

Proposition 4. Non-zero measurement errors do not a¤ ect the optimal AR slope parameters;

the optimal values of the slope parameters are unchanged under both noise and news. The optimal value of the intercept will depend on whether early release or later vintage actuals are used for forecast evaluation.

Under news, = ^_vf 0

v1, and under pure noise = _"f 0

"1 ey+ ²_"₁Ip 1 e

y , where the terms are de…ned in the appendix D.

When the forecasting model is estimated with RTV data, the results are given by Proposition 5, which is based on the results of appendix E.

Proposition 5. When there are non-zero mean measurement errors, the use of RTV delivers the optimal population values of the AR(1) forecasting model, when …rst-release actual values are used to evaluate forecast performance, whether we consider the pure news or pure noise case.

Speci…cally, the estimator of the slope parameter is optimal whatever vintage is used as actuals to evaluate forecasting performance, but the intercept is inconsistent unless the …rst-release values are used. For the AR(1), the bias in the intercept in the case of noise is _"₁ _"_f (bias measured asplim of the estimated intercept minus ), which is also the bias in the forecasts conditional on y_T. Hence the forecasts will be upward/downward biased depending on whether _"₁ ? "_f. They

1The analysis assumes RTV data is used in the forecasting function.

(10)

will be unbiased when _"₁ = _"_f which requires that …rst-release …gures are used to evaluate the forecasts, or that the mean of the revisions to later vintages is equal to the mean of the …rst revision.

For pure news, the bias is given by ^_v^f = Pl 1

i=1 viE _2t;i when last release actuals are used (f =l), so that forecasts will be upward biased when Pl 1

i=1 viE _2t;i >0, and downward biased whenPl 1

i=1 viE _2t;i <0. When the model is estimated with EOS data, the results are collected in Proposition 6 (see also appendix F).

Proposition 6. When there are non-zero mean measurement errors, the use of EOS results in inconsistent estimates of the AR(1) forecasting model’s parameters whatever vintage is used for the actuals, for both pure-news and pure-noise revisions.

4.1 Summary

Data revisions are news and their means are zero. For a general AR(p) model, the forecasts should be conditioned on EOS data and the model estimated on EOS data. Speci…cally for the AR(1) model, both RTV and EOS data are consistent for the optimal value of the slope of the AR(1) model.

Data revisions are noise and their means are zero. The estimator with EOS data is not consistent for the optimal parameter vector when the forecasts are conditioned on EOS data, whereas the estimator with RTV data is consistent for the optimal parameter vector when the forecasts are conditioned on RTV data (but not when they are conditioned on EOS). For the special case of an AR(1), the estimation with RTV data is preferred as it is consistent for the optimal parameter whereas estimation with EOS data is inconsistent.

The means of data revisions are not zero. The estimator with EOS data is inconsistent for both noise and news, whereas estimation with RTV data is consistent when …rst-release actuals are used to calculate forecast errors.

However, at issue is whether in any speci…c instance the di¤erences implied by the use of EOS and RTV data are likely to be important. From the analytical formulae, it is apparent, for example, that the performance of EOS data when there is noise will depend crucially on the relative magnitude of ²_"₁ relative to the variance of later revisions. The next section explores this and related issues for US output growth and in‡ation.

5 An empirical study using RTV and EOS data

We consider forecasting US real GDP growth and in‡ation (measured by the GDP de‡ator). Both variables are de…ned as (one hundred times) the quarterly di¤erence of the log of the level. Quar-

(11)

terly vintages of data on both these variables are available from the real-time datasets for macroeconomists (RTDSM) of Croushore and Stark (2001) from 1965:Q4 onwards. We chose to analyse in‡ation and real growth as they are the two key macro variables and have been used extensively in macro modelling. The aim of this section is to check the degree of concordance between the theory predictions and the …ndings of a real-time forecasting analysis using autoregressive models for these two variables.

5.1 Revisions: descriptive statistics and news/noise/non-zero mean tests

In the previous sections we have established that the properties of the estimators of AR models will typically depend on whether the data revisions correspond to noise or news, and whether they are zero-mean. As a …rst step, we describe the main characteristics of the revisions of quarterly output growth and GDP de‡ator in‡ation. As well as analysing the whole sample period using data vintages from 1965:Q4 to 2005:Q4, we also split the sample into 10-year subsamples. This allow us to investigate whether certain features of the revision process have changed over the last 40 years, possibly in response to changes in monetary policy (see, for example, Cogley and Sargent (2005), Primiceri (2005), and Sims and Zha (2006)), as well as methodological changes in the measurement of the data. One important change was the introduction of chain-weighted indexes to compute real GDP and the GDP de‡ator as of the 1996:Q1 data vintage.

Table 1 presents statistics for …rst-released data y^t+1_t , …nal-revised data y^t+l_t , and revisions y_t^t+l y^t+1_t . Recall that the …rst-release of data is available with a lag of one quarter, so thaty^t+1_t denotes the value of y in periodt in the data release oft+ 1, and is the …rst recorded value ofy_t. We de…ney_t^t+l in two ways. First, we follow Aruoba (2006) and set l= 12. Excluding ‘benchmark revisions’, data published by the Bureau of Economic Analysis (BEA) are normally revised up to three years after they are …rst released.² When forecasting, it is also common to use last vintage data to compute forecast errors, so we also take the last vintage to be 2006:Q4.

The table shows means, standard deviations and …rst-order autocorrelations of the …rst-released data, the ‘…nal’data and revisions, as well asp-values of tests for whether revisions are noise, or add news, and whether revisions are zero-mean. Recall that revisions are de…ned as noise if the initial estimate is an observation on the …nal series but measured with error, so that the revisions are uncorrelated with …nal value, but are correlated with data available when the initial estimate was made. Hence noisy revisions are predictable. Alternatively, revisions are news if the initial estimate is an e¢ cient forecast of the …nal value, such that the revision is unpredictable from information available at the time the initial estimate was made.³ We test for news and noise revisions using,

2In July of each year there are revisions to the National Accounts data for the …rst quarter of the current year, as well as all the quarters of the previous three years.

3See Mankiw and Shapiro (1986) for an early contribution: they found that revisions to real output added news.

(12)

respectively, the following auxiliary regressions:

y_t^t+l y_t^t+1 = + y_t^t+1+!t

y_t^t+l y_t^t+1 = + y_t^t+l+!t

where the null hypothesis is that = = 0 in both cases. The table also recordsp-values for the test that the mean of y^l_t y_t^t+1 is zero.

Data revisions to output growth and in‡ation have di¤erent characteristics. Firstly, output growth appears to be characterised by news (as found by Mankiw and Shapiro (1986) on their earlier sample period) while in‡ation revisions are noise after 1985 (especially using 2006:Q4 as

…nal data). We reject the null of zero-mean for data revisions to in‡ation in the earlier periods, and for output for the periods 65-75 and 85-95 when using the 2006:Q4 data to compute the revision. Over the period as a whole the revisions to output amount to an increase of approximately half a percentage point per annum (using the 2006:Q4 data). Equally striking is that the …rst- order autocorrelation in the …nal series (2006:Q4 vintage) is only just over a half that in the

…rst-release series (these correlations are the empirical estimates of Corr(y_t^2006:Q4; y_t^2006:Q4₁ ) and Corr(y^t+1_t ; y_t^t ₁), respectively), although the di¤erence is less marked when instead the …nal data is given byl= 12. For in‡ation, the estimates of the …rst-order autocorrelation are less dependent on the data vintage. By comparing sub-samples, it is evident that the variability of both series has clearly decreased over time (popularly referred to as the ‘Great Moderation’) - a feature of both the

…rst-release and …nal data. Consequently, we would expect to …nd that root mean squared forecast errors (RMSFE) decrease over time for both series, whether …rst or …nal-release data is used to calculate forecast errors.

Table 2 presents the standard deviations of successive rounds of data revisions. For both variables, the standard deviations of the revisions approximately halve from the …rst revision to the last, but the decrease is not monotonic in l. For a given l, the standard deviations of the revisions do not decrease over time to the same extent as the standard deviations of the data.

5.2 Revisions: estimates of the SS model and expected di¤erences in forecasting accuracy

In order to anticipate the empirical results, we estimated the underlying statistical model described in section 2 with in‡ation data over the whole period, allowing only for noise revisions⁴, and setting p = 1. We chose to model in‡ation as a process with noise revisions given the test outcomes reported in Table 1. We obtained the following parameter values:

e

yt= 0:2 + 0:8yet 1+ 0:38 _1t

4This implies thatR3=0in the SS speci…cation described in section 2.

(13)

withR²₄=diag(0:0258;0:0153;0:0131;0:0108;0:0077;0:0047;0:0033;0:0033;0:0055;0:0079;0:0087;0:0121).

For these parameter values, equations (4) and (5) indicate an optimal AR slope of around 0:75, whereas plim ^EOS is only a little upward biased at 0:78. This bias is of a similar magnitude to the (small-sample) estimation bias (see, e.g., Kendall (1954), who approximates the bias by T ¹(1 + 3 ) +O T ³⁼² ) for moderate sample sizes, and is o¤setting in sign. Hence the analytical formulae indicate only small di¤erences in parameter estimates and forecast accuracy despite the noise-induced bias to the use of EOS data.

We then used these results as the parameter values for a data generating process for a Monte Carlo exercise. Results are reported for three sample sizes (T = 100;50;25) in Table 3. As expected, the estimated slope using EOS data is a little larger than under RTV data, but the upward bias to EOS largely o¤sets the downward estimation bias, and there is hardly any di¤erence between EOS and RTV in terms of forecast accuracy (MSFE, or mean bias).⁵

We also estimated the underlying statistical model with output growth data, assuming non-zero news revisions⁶ (as indicated by Table 1), setting p= 1, and found:

e

y_t= 0:46 + 0:26ye_t ₁+ 0:79 _1t

withR²₃=diag(0:0382;0:0122;0:0197;0:0208;0:0121;0:0116;0:0153;0:0098;0:0112;0:0073;0:0078;0) and a vector of news means given by ^_v_i = (0;0:027;0:018;0:010;0;0;0:005; 0:009;0;0) whereby, for example:

v2t= 0:027 v2 2t;2 v3 2t;3 ::: v11 2t;11:

For these values, we obtain E y^t+1_t y^t+12_t = 0:072, which is consistent with Table 1. Table 3 shows very small di¤erences between EOS and RTV data for these parameter values - forecasts using RTV data are slightly more accurate than using EOS data, and are slightly more accurate when …rst-release data are used to evaluate accuracy, both as suggested by the theory - but the di¤erences are negligible. We conclude that although forecast accuracy will in principle depend on whether estimation uses EOS or RTV data, we are unlikely to observe large di¤erences for in‡ation and output growth.

5.3 Forecast accuracy comparisons of RTV and EOS in a real-time forecasting exercise

The results of the real-time forecast forecasting exercise using EOS and RTV are recorded in Table 4(a) for AR(2) models, and in Table 4(b) for AR(1) forecasting models. The use of an AR(2)

5Some experimentation (not reported) bore out the implication of the analytical formulae that RTV is markedly more accurate if we increase "₁.

6Using the SS speci…cation described in section 2, we setR4=0. We also set v1 = 0to guarantee identi…cation ofl 1news processes. This implies thaty^t+l_t =yet.

(14)

improves forecast performance for in‡ation, but not for output growth.

Three of the implications from sections 3 and 4 are as we might expect.

(A) When the means of the revisions are not zero, forecasts computed with RTV data are more accurate than using EOS data when …rst-released data are used to compute forecast errors.

We indicate the sub-samples that we expect to observe this e¤ect with red colouring in Table 4 (based on the results of the tests recorded in Table 1).

(B) When data revisions are noise, forecasts obtained with RTV data are more accurate than those using EOS data. Yellow colouring shows sub-periods for which revisions appear to be noise as suggested by the results of Table 1.

(C) When data revisions are news and their means are zero, forecasts from AR(p) computed with EOS data should perform better than using RTV data. Green colouring indicates the periods when forecasting with an AR(2) that we expect to observe this e¤ect based on the results of Table 1.

However, the Monte Carlo results of Table 3 caution against the …nding of large di¤erences.

That said, the results of the Monte Carlo may understate the di¤erences between EOS and RTV data for two reasons. First, the state-space model of data revisions may not be an especially good model of the data revisions process over the whole period, to the extent that revisions may not follow the regular patterns implied by the model, and there may be parameter non-constancy (witness the di¤erences across sub-periods in Table 1). Second, the empirical forecast exercise is based on a recursive forecasting exercise where the successive estimation samples will be highly correlated, as opposed to the independent replications of the Monte Carlo.

The comparison of RMSFE ratios and mean bias di¤erences in Table 4 indicate that the predictions of the theory are in general con…rmed by the data. When the mean of the revisions is positive, as in the 85-95 period for output growth, the use of RTV data to forecast …rst-released data is superior to EOS data: in Table 4(b) the RMSFE ratio of RTV to EOS is 1.033 for …nal data, and 0.973 for …rst release, con…rming implication (A). (Using the AR(2), Table 4(a), RTV is a little less accurate than EOS on RMSFE using …rst-release data, but relatively better than when …nal data is used). When revisions are noise, the use of RTV data delivers better forecasts of the revised data: see the two later sub-periods for in‡ation that con…rm implication (B). When revisions are news and their means are zero, the use of EOS data delivers more accurate forecasts with the AR(2) model, when accuracy is evaluated using …nal data, for output in the 75-85 and 95-05 periods, which corroborates implication (C).

The di¤erences between EOS and RTV data are larger than would be expected based on the Monte Carlo results, but one might argue that they are of secondary importance from a forecasting

(15)

perspective, especially relative to other factors that deliver marked improvements in forecast accuracy (e.g., the use of timely information on monthly indicators, as in Clements and Galvão (2007) and Giannone, Reichlin and Small (2005)).

Finally, although our results are for forecasting y_T₊₁, based on information up to T, also of interest is whether qualitatively similar empirical results hold for one-year-ahead forecasts of y_T₊₄ based on T. Table 5 replicates Table 4 but for one year-ahead forecasts. Forecasts were obtained from AR(1) models by ‘direct forecasting’(see, e.g., Clements and Hendry (1996), Bhansali (2002), and Marcellino, Stock and Watson (2006)), based on:

yt+4= _0;4+ _1;4yt+wt+4:

The longer lag between the LHS and RHS variables allow more scope for di¤erences in the estimates of the coe¢ cients using EOS and RTV data because of data vintage e¤ects. However, because only one lag is used, forecasts are computed conditional on the same information for both EOS and RTV data, and we can only verify implications (A) and (B). The forecasting results 4 are qualitatively similar to those in Table 4 (b). Similar results hold if instead the forecasts are calculated iteratively (for output growth, results not shown), whereas for in‡ation direct forecasting is clearly superior to iterating (gains in terms of RMSFEs as large as 20% depending on the sample). This suggests that the use of real-time data may also matter when forecasting longer horizons.

6 Conclusions

In recent times there has been a growing appreciation of the e¤ects of data revisions on various aspects of macro-modelling (such as the calculation of output gaps and conduct of monetary policy:

e.g., Orphanides (2001) and Orphanides and van Norden (2005)), as well as the relevance of data revisions for forecasting (as reviewed by Croushore (2006)). We have tackled one aspect of forecasting when there are data revisions: namely, does the way in which the data set is constructed matter when forecasts are based on autoregressive models? We have considered two ways of constructing the estimation sample - the conventional approach of using end-of-sample data, and the use of real-time-vintage data, as suggested by Koenig et al.(2003). Our analytical results imply that the way real-time data sets are constructed matters, and we relate the properties of the estimators of the AR model parameters to the standard classi…cations of data revisions as news or noise. When combining our analytical results with estimates of the underlying statistical model with data revisions calibrated using output growth and in‡ation in the US in the post WWII period, we …nd that there are only likely to be small di¤erences in forecasting accuracy between the two ways of using real-time data. This is borne out by a Monte Carlo exercise.

An empirical forecast accuracy comparison between the two methods of organizing real-time data is also undertaken, based on a recursive forecasting scheme, and using both …rst-released and

(16)

…nal data to measure forecast accuracy. As expected, the di¤erences between the two methods are in general relatively small.

References

Aruoba, S. B. (2006). Data revisions are not well-behaved. Journal of Money, Credit and Banking.

Forthcoming.

Bhansali, R. J. (2002). Multi-step forecasting. In Clements, M. P., and Hendry, D. F. (eds.), A Companion to Economic Forecasting, pp. 206–221: Oxford: Blackwells.

Clark, T. E., and McCracken, M. W. (2007). Tests of equal predictive ability with Real-Time data.

Discussion paper, Economic Research Dept., Federal Reserve Bank of Kansas City.

Clements, M. P., and Galvão, A. B. (2007). Macroeconomic forecasting with mixed-frequency data:

Forecasting output growth in the United States.Journal of Business and Economic Statistics.

Forthcoming.

Clements, M. P., and Hendry, D. F. (1996). Multi-step estimation for forecasting. Oxford Bulletin of Economics and Statistics,58, 657–684.

Cogley, T., and Sargent, T. J. (2005). Drifts and volatilities: Monetary policies and outcomes in the post World War II US. Review of Economic Dynamics,8, 262–302.

Croushore, D. (2006). Forecasting with real-time macroeconomic data. In Elliott, G., Granger, C., and Timmermann, A. (eds.),Handbook of Economic Forecasting, Volume 1. Handbook of Economics 24, pp. 961–982: Elsevier, Horth-Holland.

Croushore, D., and Stark, T. (2001). A real-time data set for macroeconomists. Journal of Econo- metrics,105, 111–130.

Croushore, D., and Stark, T. (2003). A real-time data set for macroeconomists: Does the data vintage matter?. The Review of Economics and Statistics,85, 605–617.

Faust, J., Rogers, J. H., and Wright, J. H. (2003). Exchange rate forecasting: The errors we’ve really made. Journal of International Economic Review,60, 35–39.

Giannone, D., Reichlin, L., and Small, D. (2005). Nowcasting GDP and In‡ation: The real-time informational content of macroeconomic data releases. Finance and economics discussion series, 2005-42, The Federal Reserve Board.

Howrey, E. P. (1978). The use of preliminary data in economic forecasting.The Review of Economics and Statistics,60, 193–201.

Howrey, E. P. (1984). Data revisions, reconstruction and prediction: an application to inventory investment. The Review of Economics and Statistics,66, 386–393.

Jacobs, J. P. A. M., and van Norden, S. (2006). Modeling data revisions: Measurement error and

(17)

dynamics of ‘true’values. Ccso working paper 2006/07, CCSO Centre for Economic Research, University of Groningen.

Kendall, M. G. (1954). Note on bias in the estimation of autocorrelation. Biometrika,61, 403–404.

Koenig, E. F., Dolmas, S., and Piger, J. (2003). The use and abuse of real-time data in economic forecasting. The Review of Economics and Statistics,85(3), 618–628.

Kundan Kishor, N., and Koenig, E. F. (2005). VAR estimation and forecasting when data are subject to revision.. Research Department Working Paper 0501, Federal Reserve Bank of Dallas.

Mankiw, N. G., and Shapiro, M. D. (1986). News or noise: An analysis of GNP revisions. Survey of Current Business (May 1986), US Department of Commerce, Bureau of Economic Analysis, 20–25.

Marcellino, M., Stock, J. H., and Watson, M. W. (2006). comparison of direct and iterated multistep ar methods for forecasting macroeconomic time series. Journal of Econometrics, 135, 499–

526.

Orphanides, A. (2001). Monetary policy rules based on real-time data.American Economic Review, 91(4), 964–985.

Orphanides, A., and van Norden, S. (2005). The reliability of in‡ation forecasts based on output gaps in real time. Journal of Money, Credit and Banking,37, 583–601.

Primiceri, G. (2005). Time varying structural vector autoregressions and monetary policy. Review of Economic Studies,72, 821–852.

Robertson, J. C., and Tallman, E. W. (1998). Data vintages and measuring forecast model performance. Federal Reserve Bank of Atlanta Economic Review, Fourth Quarter, 4–20.

Sargent, T. J. (1989). Two models of measurements and the investment accelerator. Journal of Political Economy,97, 251–287.

Sims, C. A., and Zha, T. (2006). Were there regime switches in U.S. monetary policy?. American Economic Review,96, 1193–1224.

(18)

A Proof of Proposition 1.

A.1 RTV in the forecast function.

Consider the case of RTV,y_T =y_T. We expand the criterion function in (3) as:

E y^T_T₊₁^+f y_T

2

=E ⁰eyT +R_{1 1T}₊₁+ ^v_T^T^+f₊₁ +"^T_T^+f₊₁ ⁰eyT 0vT 0"_T ²

=Eh

0 0 ye_T +R_{1 1T}₊₁+ ^v^T_T₊₁^+f+"^T_T^+f₊₁ ⁰v_T ⁰"_Ti2

= ⁰ ⁰ _e_y( ) +R²₁+E v^_T^T^+f₊₁ ²+E "^T_T^+f₊₁ ²

+ ⁰ _v + ⁰ _" 2 ⁰ ⁰ _e_yv

wherev^_T^T₊₁^+f = v_T^T₊₁⁺²+v_T^T₊₁^+f,y_T =ey_T+v_T+"_T,ey⁰_T = (ey_T; : : : ;ye_T _p+1),v⁰_T = v_T^T⁺¹; : : : ; v_T^T _p+1^p+2

and"⁰_T = "^T_T⁺¹; : : : ; "^T_T ^p+2_p+1 . We let _y_e=E(ey_Tey⁰_T), _v=E(v_Tv_T⁰ ) = ²_v₁Ip, "=E("_T"⁰_T) =

2"1I_p, _yv_e = E(ey_Tv_T⁰ ), and note that all cross-product terms other than _yv_e are zero. The

…rst-order conditions are:

2 _y_e 2 _y_e + 2 v + 2 " + 4 _e_yv 2 _e_yv =0:

Noting that _e_yv = _v (see below), we obtain = for news, and for pure noise, and =

e

y+ ²_"₁Ip 1 e y .

A.1.1 Proof of _yv_e = _v.

Recall that _v = E(v_Tv⁰_T), and _e_yv = E(ye_Tv⁰_T), where ye_T⁰ = (ye_T; : : : ;ye_T _p+1) and v⁰_T = v^T_T⁺¹; : : : ; v^T_T ^p+2_p+1 . Note v^T_T⁺¹=Pl

i=1 vi 2T;i,v_T^T ₁ =Pl

i=1 vi 2T 1;i, etc. Writing:

2 66 66 64

e yT

e y_T _p+1

3 77 77 75

= 2 66 66 64

0eyT 1

0ey_T _p 3 77 77 75

+R₁ 2 66 66 64

1T

1;T p+1

3 77 77 75

v_T

the result follows because the …rst two terms on the RHS are uncorrelated withv_T.

A.2 EOS data in the forecast function

Consider the case of EOS: yT =y_T^T⁺¹.

(19)

The expected squared forecast error is given by:

E y_T^T^+f₊₁ y^T_T⁺¹ ² =E ⁰ey_T +R_{1 1T}₊₁+ ^v_T^T₊₁^+f +"^T_T^+f₊₁ ⁰ey_T ⁰v^T_T⁺¹ ⁰"^T_T⁺¹ ² where nowy_T^T⁺¹=ey_T +v_T^T⁺¹+"^T_T⁺¹,

v^T_T⁺¹⁰ = v_T^T⁺¹; : : : ; v_T^T⁺¹_p+1 =h Pl

i=1 vi 2;T;i; : : : ; Pl

i=p vi 2;T p+1;i

i

;

and "^T_T⁺¹⁰ = "^T_T⁺¹; : : : ; "^T_T⁺¹_p+1 . Multiplying out the expression for the squared forecast error

as before, and let v = E v_T^T⁺¹v^T_T⁺¹⁰ , " = E "^T_T⁺¹"^T_T⁺¹⁰ , _e_yv =E eyTv^T_T⁺¹⁰ . Note that

e

yv= E vTv_T^T⁺¹⁰ = diag Pl

i=1 2

vi; ; : : : ;Pl

i=p 2

vi forp < l, and so _e_yv= E v_T^T⁺¹v^T_T⁺¹⁰ =

v. Straightforward calculations show that = for news, and for pure noise, and =

e

y+ _" ¹ _e_y , where _"=diag ²_"₁; ²_"₂; : : : ²_"_p , with ²_"_s = ²_"

l fors > l.

A.3 Comparison of EOS and RTV in the forecast function

A.3.1 Pure News

The optimal AR parameter is the same ( = ) irrespective of whether the forecast function uses EOS or RTV.

The minimum MSFE is:

E y_T^T^+f₊₁ y^T_T⁺¹ ² =R²₁+E v^_T^T₊₁^+f ²+ ⁰ _v

compared to min value of E y_T^T₊₁^+f y_T ² using RTV in forecast function:

E y_T^T₊₁^+f y_T

2

=R²₁+E ^v^T_T^+f₊₁ ²+ ⁰ _v :

The two di¤er by ⁰ _v _v where:

v v =

2 66 66 64

Pl

i=1 2

vi

Pl

i=1 2

vi 0

Pl

i=2 2

vi

Pl

i=1 2

vi

. ..

0 Pl

i=p 2

vi

Pl

i=1 2

vi

3 77 77 75

= 2 66 66 64

0 0

2v1

. ..

0 Pl

i=p+1 2

vi

3 77 77 75 :

Because _v _v is negative semi-de…nite, the expected squared-error using RTV exceeds that from using EOS when p >1, and the two are equal whenp= 1.