AUTOREGRESSIVE AND MOVING AVERAGE MODELS UNDER STATIONARITY AND NON-STATIONARITY

A starting point in dynamic regression models is often provided by considering dynamic structures in the outcomes. Autoregressive process models describe data driven depend-ence in an outcome over successive time points. For continuous data yt, observed at timest1,: :T the simplest autoregressive dependence in the outcomes is of order 1, meaning values ofyat timetdepend upon their immediate predecessor. Thus an, AR(1) model typically has the form

yt mr₁ytÿ1ut t2,: :T (5:1) wheremrepresents the level of the outcome, andrmodels the autocorrelation between successive observations. After accounting for such observation driven serial depend-ence, the errors may (at least initially) be taken as exchangeable white noise and to follow, for example, a Normal density,u_t N(0,s²) with constant variance and preci-siont1=s²across all time pointst, and cov (u_s,u_t)0.

If the data are centred, then a simpler model may be estimated

y_try_tÿ1u_t (5:2)

Additional dependence on lagged observationsy_tÿ2,y_tÿ3,:y_tÿpleads to AR(2), AR(3), . . AR(p) processes. It may be noted that (5.1), if taken to apply tot1 also, implies reference to unobserved or latent data y0. If a prior on y0 is included in the model specification, this leads to what is known as a full likelihood model. For an AR(p) model with timest1,: :pincluded there arepimplicit latent values,y0,yÿ1,: :y1ÿpin the full likelihood model.

Classical estimation and forecasting with the AR(p) model rest on stationarity, which essentially means that the process generating the series is the same whenever observation starts: so the vectors (y1,: :yk) and (yt,. . .ytk) have the same distribution for alltandk.

Specifically, the expectationsE(y_t) and covariancesC(y_t,y_tk) are independent oft. For the stationary AR(p) model to be applicable, an observed data series may require initial transformation and differencing to eliminate trend. Non-stationary time series can often be transformed to stationarity by differencing of order d, typically first or second differencing at most. This may be combined with a scale transformation, e.g.

Y_tlog (y_t).

Using the B operator to denote a backward movement in time, a first difference (d 1) inyt

ztytÿytÿ1

may equivalently be written

ztytÿByt (1ÿB)yt

Then withzt now the outcome, the AR(1) model (5.2) becomes

172 MODELS FOR TIME SERIES

ztÿrztÿ1ut

or z_t(1ÿrB)u_t

An AR(p) process inzt leads to apth order polynomial inB, so that zt(1ÿr₁Bÿr₂B²ÿ. . .r_pB^p)ut

for which an alternative notation is

r(B)ztut

The process is stationary if the roots ofr(B) lie outside the unit circle. For instance if p1, the series is stationary ifjr₁j<1.

In the AR(p) model, an outcome depends upon its past values and a random error or innovation termu_t. If the impact ofu_t is in fact not fully absorbed in periodt, there may be moving average dependence in the error term also. Thus, for centred data, the model

ztÿr₁ztÿ1utÿu1utÿ1 (5:3) defines a first order moving averageMA(1) process inut combined with AR(1) depend-ence in the data themselves. In BUGS, moving average effects in theutfor metric (e.g.

Normal or Student t) data may require an additional measurement error term to be introduced, because the centred density adopted in BUGS for Normal and Studenttdata assumes unstructured errors. In general an ARIMA(p,d,q) model is defined by depend-ence up to lag p in the observations, by q lags in the error moving average, and by differencing the original observation (y_t)dtimes. An ARMA(p,q) model iny_t, therefore, retains the original data without differencing. In the general ARMA(p,q) representation,

r(B)y_t u(B)u_t (5:4)

the process is stationary if the roots ofr(B) lie outside the unit circle, and invertible if the roots of u(B) lie outside the unit circle. For instance, if pq1, the series is stationary and invertible ifjr₁j<1 andju₁j<1.

In a distributed lag regression predictorsxt, and their lagged values, are introduced in addition to the lagged observationsytÿ1,ytÿ2, etc. A distributed lag model for centred data has the form

yt X

b_mxtÿmut (5:5)

while a model with lags in bothyandxmay be called an Autoregressive Distributed Lag (ADL or ARDL) model (see Bauwenset al., 1999; Greene, 2000):

r(B)yt b(B)xtut

The latter form leads into recent model developments in terms of error correction models.

Dependent Errors

In the specifications above, the errors ut are assumed temporally uncorrelated with diagonal covariance matrix and autocorrelation is confined to the observations

themselves. However, if correlation exists between the errors then the covariance matrix is no longer diagonal. Letet be correlated errors with

ytabxtet (5:6a)

and suppose that an AR(p) transformation of thee_t is required g(B)etut

in order thatu_tis unstructured with constant variance, where g(B)1ÿg₁Bÿg₂B²ÿ. . .g_pB^p A frequently occurring model is one with AR(1) errorsetsuch as

y_tabx_te_t etgetÿ1ut

More generally, regression models may be defined with ARMA(p,q) errors

etÿg₁etÿ1ÿg₂etÿ2: :ÿg_petÿputÿu1utÿ1ÿu2utÿ2. . .:ÿuqutÿq (5:6b) To facilitate estimation, the AR(1) error model may be re-expressed in non-linear autoregressive form, for observationst>1 subsequent to the first, and with homogen-ous errorsut,

y_tgy_tÿ1aÿagbx_tÿgbx_tÿ1u_t

g(y_tÿ1ÿbx_tÿ1)a(1ÿg)bx_tu_t (5:7) So the intercept in the original model is obtained by dividing the intercept in the transformed data model by 1ÿg.

Multivariate series

The above range of models may be extended to modelling multivariate dependence through time, with each series depending both on its own past and the past values of the other series. One advantage of simultaneously modelling several series is the possi-bility of pooling information to improve precision and out-of-sample forecasts. Vector autoregressive models have been used especially in economic forecasts for related units of observation, for example of employment in industry sectors or across regions, and of jointly dependent series (unemployment and production).

For example, a time series of K centred metrical variables Yt (y1t,y2t,: :yKt)⁰ is a multivariate Normal autoregression of order p, denoted VAR(p), if it follows the relations

YtFp1Ytÿ1. . .:FppYtÿpUt

U_tN_K(0, V) (5:8)

where the matrices F_p1,: :F_pp are each KK, and the covariance matrix is for exchangeable errorsu_1t,u_2t,: :,u_Kt. Then ifK2,F_p1would consist of own-lag coeffi-cients relatingY_1t andY_2t to the lagged valuesY₁,tÿ1 andY₂,tÿ1 and cross-lag coeffi-cients relatingY_1t toY₂,tÿ1 andY_2ttoY₁,tÿ1.

5.2.1 Specifying priors

Bayesian time series applications with autoregressive and moving average components have included all the above modelling approaches, and have also been applied in models

174 MODELS FOR TIME SERIES

combining state-space and classical time series concepts (Huerta and West, 1999).

Among the questions that are involved in specifying priors for ARMA type model parameters are whether stationarity and invertibility constraints are taken, whether a full or conditional likelihood approach is used, and assumptions made about the innovation errors. As discussed in Chapter 1, prior elicitation consists in incorporating relevant background knowledge into the formulation of priors on parameters¹. Often, relevant knowledge is limited, and diffuse or `just proper' priors are called on. This raises questions of sensitivity to prior specifications, for instance on variances (Daniels, 1999) or on time series assumptions (e.g. on stationarity or otherwise or on initial conditions), and the reader is encouraged to experiment with alternative priors in the worked examples of the chapter.

Autoregressive and ARMA models without stationarity

Consider first the autoregressive AR(p) model in the endogenous variable, r(B)yt ut

Unlike classical approaches, a Bayesian analysis of the AR(p) model is not confined to stationary processes. As emphasized by Zellner (1971) a prior assumption of statio-narity in the AR(p) process may be regarded as a modelling assumption to be assessed, rather than a necessary restriction. Hence in an iterative sampling framework, an autoregressive model may be applied to observations y_t without pre-differencing to eliminate trend, and the probability of stationarity assessed by the proportion of iterationss1,: :,Swhere stationarity in the coefficientsr^(s) at iterationsactually held in terms of roots located outside the unit circle. A significant probability of non-stationarity would then imply the need for differencing, different error assumptions, or model elaboration, for example, to a higher order AR model (Naylor and Marriott, 1996).

One approach of Zellner (1971) without a stationarity constraint is to use a non-informative reference prior, such as Jeffrey's prior, with

p(r₁,. . .r_p,t)/t^ÿ1

where t1=s². In BUGS implementation of this approach would require direct sampling from the full conditionals. As with any non-informative prior, potential problems of identifiability may be increased, whereas identifiability generally improves as just proper or informative proper priors are adopted.

The reference prior approach may be generalised to include AR(p) processes with normal-gamma conjugate priors (Broemeling and Cook, 1993). For example, with a gammaG(a,b) prior fort,rjtis taken to be multivariate Normal N(r,t0), whereris the prior mean on the lag coefficients, and 0 is a pp positive definite matrix. A straightforward analysis is defined by conditioning the likelihood on the firstp obser-vationsY₁{y₁,y₂,. . .,y_p}, so avoiding the specification of a prior on the latent pre-series value. The likelihood then only relates to observationsY₂{y_p1,y_p2,. . .,y_n}.

The conditional likelihood is then

1 In Bayesian econometrics, a distinction is sometimes made between two types of formal elicitation procedures, structural or predictive ± structural methods involve assessment of the quantiles of the prior distribution of parameters, drawing on theoretical models or past experience (Bauwenset al., 1999; Kadane, 1980).

f(Y2jY1,r,t)/t^0:5(nÿp)exp ÿ0:5t Xⁿ

tp1

[r(B)yt]²

(5:9) The posterior is proportional to the product of the two priors and the conditional likelihood.

Naylor and Marriott (1996) discuss a full likelihood analysis of the ARMA model without stationarity constraints by using proper but relatively `weak' priors on the latent pre-series valuesY0(y0,yÿ1,: :,y1ÿp) andE0(u0,uÿ1,: :,u1ÿq). For instance, if the observed series is assumed Normal with mean m and conditional variance s², Naylor and Marriott suggest the pre-series values Y0 be taken as Student twith low degrees of freedom, having the same mean as the main series but a variance larger by a factork1, namelyks². (This is equivalent to dividing the precisiontbyk.) If there are several pre-series values (whenp>1), a multivariate Studenttmight be used. Note that Zellner (1971, p. 87) suggests a prior for y₀ that does not involve any of the parameters of the main model.

Priors on error terms

The assumption of white noise errors in the AR(p) model may need to be assessed with more general priors allowing for outlier measurements. However, interpretations of outlying points in time series are made cautiously. Outliers at time t may be clearly aberrant, and either excluded or replaced by interpolated values taking account of surrounding values (Diggle, 1990). On the other hand, especially in economic time series, they may reflect aspects of economic behaviour which should be included in the specification (Thomas, 1997).

Some fairly conventional approaches are for a Normal mixture distribution or Student t errors to replace the usual Normal error assumption (Hoek et al., 1995;

West, 1996). Thus, let Dbe the small probability of an outlier (e.g.D0:05), and let the binary indicator

J_t Bernoulli(D)

govern whether the observationtis an outlier. Then one alternative (West , 1996) to the Normal errors AR(1) model in (5.1) is

y_t mry_tÿ1u_t

whereutN(0,Kts²), and where random or fixed effect parametersKt>1 inflate the variance whenJt1. This is known as an innovation outlier model and may be written as

ut(1ÿD)N(0,s²)DN(0,Ks²) (5:10) If the Studenttis used as an outlier model for the innovations then the most appropriate option is the scale mixture form ± this includes weightswt averaging 1 which scale the precision ± so low weights (e.g. under 0.5) indicate possible outliers. In models with autocorrelated errors, such as

ytabxtet

e_tÿg₁e_tÿ1u_t the innovation outlier model would apply to theut.

176 MODELS FOR TIME SERIES

One may also define additive outliers corresponding to shifts in the observation series that may not occur for all time points. So, following Barnettet al. (1996) one might define a model

y_t abx_to_te_t e_tÿg₁e_tÿ1u_t

withotN(0,K1ts²), whereK1tis either 0 or positive (corresponding to times when an additive outlier does or does not occur), andut N(0,K2ts²), whereK2t is either 1 or greater than 1. One might then model the two outliers jointly, for instance via a discrete set of possible values forKt{K1t,K2t}. Barnettet al. illustrate this with a prior forKt

consisting of (0, 1), (3, 1), (10, 1), (0, 3), (0, 10) with selection among them based on a multinomial rather than binary indicatorJt, but with prior probabilities possibly biased towards the null option (0, 1). For example, prior probabilities on the just named options might be (0.9, 0.025, 0.025, 0.025, 0.025).

Another approach to additive outliers is developed by McCulloch and Tsay (1994).

They consider first a random level-shift autoregressive (RLAR) model ytm_tet

m_tm_tÿ1d_tZ_t

etg₁etÿ1g₂etÿ2. . .ut

wheredtis Bernoulli with probabilityDand governs the chance of a level shift at timet, the termsZ_t N(0,j²) describe the shifts, theetare autoregressive errors and the white noise errorsutare N(0,s²). The shift variancej²is taken as a large multiple (e.g. 10, or 100) times the white noise variance s². The probability of a shift D is beta with parameters favouring low probabilities, for instanceDBeta(5, 95). The above model may be re-expressed as

ytm_tg₁(ytÿ1ÿm_tÿ1)g₂(ytÿ2ÿm_tÿ2): :ut

m_tm_tÿ1d_tZ_t

McCulloch and Tsay also propose a specialised additive outlier model, namely y_t o_te_t

ot dtZ_t

e_t g₁e_tÿ1g₂e_tÿ2. . .u_t

which is the same as the RLAR model except that the levelm_t(o_t) no longer depends upon its previous value. This model can be re-expressed as

yt otg₁(ytÿ1ÿotÿ1)g₂(ytÿ2ÿotÿ2): :ut

o_t d_tZ_t

Another version of this model (Martin and Yohai, 1986) has yt (1ÿdt)etdtZ_t withe_t again autoregressive.

Priors consistent with stationarity and invertibility

Prior assumptions regarding stationarity or stationarity (and invertibility or non-invertibility) may interrelate to other aspects of time series model specification. Thus,

specifying priors for a full likelihood involving all observations may be more straight-forward for a stationary model. Consider the AR(1) model

yt mrytÿ1ut

Then for a stationary process withr2[ÿ1, 1], and with exchangeable errorsut with mean 0 and variances², the first observationy1 has meanmand conditional variance

s²=(1ÿr²) (5:11)

rather thans². This analytic form corresponds to assuming an infinite history for the process. Forp>2, a matrix generalisation of (5.11) is involved. For instance forp2 with lag coefficients {r₁,r₂}, the equivalent of (5.11) is a bivariate Normal for Y1{y1,y2} with covariance matrixs²S, where

RSR⁰K1(2)K1(2)⁰

No documento Applied Bayesian Modelling (páginas 182-188)