A starting point in dynamic regression models is often provided by considering dynamic structures in the outcomes. Autoregressive process models describe data driven depend-ence in an outcome over successive time points. For continuous data yt, observed at timest1,: :T the simplest autoregressive dependence in the outcomes is of order 1, meaning values ofyat timetdepend upon their immediate predecessor. Thus an, AR(1) model typically has the form
yt mr1ytÿ1ut t2,: :T (5:1) wheremrepresents the level of the outcome, andrmodels the autocorrelation between successive observations. After accounting for such observation driven serial depend-ence, the errors may (at least initially) be taken as exchangeable white noise and to follow, for example, a Normal density,ut N(0,s2) with constant variance and preci-siont1=s2across all time pointst, and cov (us,ut)0.
If the data are centred, then a simpler model may be estimated
ytrytÿ1ut (5:2)
Additional dependence on lagged observationsytÿ2,ytÿ3,:ytÿpleads to AR(2), AR(3), . . AR(p) processes. It may be noted that (5.1), if taken to apply tot1 also, implies reference to unobserved or latent data y0. If a prior on y0 is included in the model specification, this leads to what is known as a full likelihood model. For an AR(p) model with timest1,: :pincluded there arepimplicit latent values,y0,yÿ1,: :y1ÿpin the full likelihood model.
Classical estimation and forecasting with the AR(p) model rest on stationarity, which essentially means that the process generating the series is the same whenever observation starts: so the vectors (y1,: :yk) and (yt,. . .ytk) have the same distribution for alltandk.
Specifically, the expectationsE(yt) and covariancesC(yt,ytk) are independent oft. For the stationary AR(p) model to be applicable, an observed data series may require initial transformation and differencing to eliminate trend. Non-stationary time series can often be transformed to stationarity by differencing of order d, typically first or second differencing at most. This may be combined with a scale transformation, e.g.
Ytlog (yt).
Using the B operator to denote a backward movement in time, a first difference (d 1) inyt
ztytÿytÿ1
may equivalently be written
ztytÿByt (1ÿB)yt
Then withzt now the outcome, the AR(1) model (5.2) becomes
172 MODELS FOR TIME SERIES
ztÿrztÿ1ut
or zt(1ÿrB)ut
An AR(p) process inzt leads to apth order polynomial inB, so that zt(1ÿr1Bÿr2B2ÿ. . .rpBp)ut
for which an alternative notation is
r(B)ztut
The process is stationary if the roots ofr(B) lie outside the unit circle. For instance if p1, the series is stationary ifjr1j<1.
In the AR(p) model, an outcome depends upon its past values and a random error or innovation termut. If the impact ofut is in fact not fully absorbed in periodt, there may be moving average dependence in the error term also. Thus, for centred data, the model
ztÿr1ztÿ1utÿu1utÿ1 (5:3) defines a first order moving averageMA(1) process inut combined with AR(1) depend-ence in the data themselves. In BUGS, moving average effects in theutfor metric (e.g.
Normal or Student t) data may require an additional measurement error term to be introduced, because the centred density adopted in BUGS for Normal and Studenttdata assumes unstructured errors. In general an ARIMA(p,d,q) model is defined by depend-ence up to lag p in the observations, by q lags in the error moving average, and by differencing the original observation (yt)dtimes. An ARMA(p,q) model inyt, therefore, retains the original data without differencing. In the general ARMA(p,q) representation,
r(B)yt u(B)ut (5:4)
the process is stationary if the roots ofr(B) lie outside the unit circle, and invertible if the roots of u(B) lie outside the unit circle. For instance, if pq1, the series is stationary and invertible ifjr1j<1 andju1j<1.
In a distributed lag regression predictorsxt, and their lagged values, are introduced in addition to the lagged observationsytÿ1,ytÿ2, etc. A distributed lag model for centred data has the form
yt X
m0
bmxtÿmut (5:5)
while a model with lags in bothyandxmay be called an Autoregressive Distributed Lag (ADL or ARDL) model (see Bauwenset al., 1999; Greene, 2000):
r(B)yt b(B)xtut
The latter form leads into recent model developments in terms of error correction models.
Dependent Errors
In the specifications above, the errors ut are assumed temporally uncorrelated with diagonal covariance matrix and autocorrelation is confined to the observations
themselves. However, if correlation exists between the errors then the covariance matrix is no longer diagonal. Letet be correlated errors with
ytabxtet (5:6a)
and suppose that an AR(p) transformation of theet is required g(B)etut
in order thatutis unstructured with constant variance, where g(B)1ÿg1Bÿg2B2ÿ. . .gpBp A frequently occurring model is one with AR(1) errorsetsuch as
ytabxtet etgetÿ1ut
More generally, regression models may be defined with ARMA(p,q) errors
etÿg1etÿ1ÿg2etÿ2: :ÿgpetÿputÿu1utÿ1ÿu2utÿ2. . .:ÿuqutÿq (5:6b) To facilitate estimation, the AR(1) error model may be re-expressed in non-linear autoregressive form, for observationst>1 subsequent to the first, and with homogen-ous errorsut,
ytgytÿ1aÿagbxtÿgbxtÿ1ut
g(ytÿ1ÿbxtÿ1)a(1ÿg)bxtut (5:7) So the intercept in the original model is obtained by dividing the intercept in the transformed data model by 1ÿg.
Multivariate series
The above range of models may be extended to modelling multivariate dependence through time, with each series depending both on its own past and the past values of the other series. One advantage of simultaneously modelling several series is the possi-bility of pooling information to improve precision and out-of-sample forecasts. Vector autoregressive models have been used especially in economic forecasts for related units of observation, for example of employment in industry sectors or across regions, and of jointly dependent series (unemployment and production).
For example, a time series of K centred metrical variables Yt (y1t,y2t,: :yKt)0 is a multivariate Normal autoregression of order p, denoted VAR(p), if it follows the relations
YtFp1Ytÿ1. . .:FppYtÿpUt
UtNK(0, V) (5:8)
where the matrices Fp1,: :Fpp are each KK, and the covariance matrix is for exchangeable errorsu1t,u2t,: :,uKt. Then ifK2,Fp1would consist of own-lag coeffi-cients relatingY1t andY2t to the lagged valuesY1,tÿ1 andY2,tÿ1 and cross-lag coeffi-cients relatingY1t toY2,tÿ1 andY2ttoY1,tÿ1.
5.2.1 Specifying priors
Bayesian time series applications with autoregressive and moving average components have included all the above modelling approaches, and have also been applied in models
174 MODELS FOR TIME SERIES
combining state-space and classical time series concepts (Huerta and West, 1999).
Among the questions that are involved in specifying priors for ARMA type model parameters are whether stationarity and invertibility constraints are taken, whether a full or conditional likelihood approach is used, and assumptions made about the innovation errors. As discussed in Chapter 1, prior elicitation consists in incorporating relevant background knowledge into the formulation of priors on parameters1. Often, relevant knowledge is limited, and diffuse or `just proper' priors are called on. This raises questions of sensitivity to prior specifications, for instance on variances (Daniels, 1999) or on time series assumptions (e.g. on stationarity or otherwise or on initial conditions), and the reader is encouraged to experiment with alternative priors in the worked examples of the chapter.
Autoregressive and ARMA models without stationarity
Consider first the autoregressive AR(p) model in the endogenous variable, r(B)yt ut
Unlike classical approaches, a Bayesian analysis of the AR(p) model is not confined to stationary processes. As emphasized by Zellner (1971) a prior assumption of statio-narity in the AR(p) process may be regarded as a modelling assumption to be assessed, rather than a necessary restriction. Hence in an iterative sampling framework, an autoregressive model may be applied to observations yt without pre-differencing to eliminate trend, and the probability of stationarity assessed by the proportion of iterationss1,: :,Swhere stationarity in the coefficientsr(s) at iterationsactually held in terms of roots located outside the unit circle. A significant probability of non-stationarity would then imply the need for differencing, different error assumptions, or model elaboration, for example, to a higher order AR model (Naylor and Marriott, 1996).
One approach of Zellner (1971) without a stationarity constraint is to use a non-informative reference prior, such as Jeffrey's prior, with
p(r1,. . .rp,t)/tÿ1
where t1=s2. In BUGS implementation of this approach would require direct sampling from the full conditionals. As with any non-informative prior, potential problems of identifiability may be increased, whereas identifiability generally improves as just proper or informative proper priors are adopted.
The reference prior approach may be generalised to include AR(p) processes with normal-gamma conjugate priors (Broemeling and Cook, 1993). For example, with a gammaG(a,b) prior fort,rjtis taken to be multivariate Normal N(r,t0), whereris the prior mean on the lag coefficients, and 0 is a pp positive definite matrix. A straightforward analysis is defined by conditioning the likelihood on the firstp obser-vationsY1{y1,y2,. . .,yp}, so avoiding the specification of a prior on the latent pre-series value. The likelihood then only relates to observationsY2{yp1,yp2,. . .,yn}.
The conditional likelihood is then
1 In Bayesian econometrics, a distinction is sometimes made between two types of formal elicitation procedures, structural or predictive ± structural methods involve assessment of the quantiles of the prior distribution of parameters, drawing on theoretical models or past experience (Bauwenset al., 1999; Kadane, 1980).
f(Y2jY1,r,t)/t0:5(nÿp)exp ÿ0:5t Xn
tp1
[r(B)yt]2
!
(5:9) The posterior is proportional to the product of the two priors and the conditional likelihood.
Naylor and Marriott (1996) discuss a full likelihood analysis of the ARMA model without stationarity constraints by using proper but relatively `weak' priors on the latent pre-series valuesY0(y0,yÿ1,: :,y1ÿp) andE0(u0,uÿ1,: :,u1ÿq). For instance, if the observed series is assumed Normal with mean m and conditional variance s2, Naylor and Marriott suggest the pre-series values Y0 be taken as Student twith low degrees of freedom, having the same mean as the main series but a variance larger by a factork1, namelyks2. (This is equivalent to dividing the precisiontbyk.) If there are several pre-series values (whenp>1), a multivariate Studenttmight be used. Note that Zellner (1971, p. 87) suggests a prior for y0 that does not involve any of the parameters of the main model.
Priors on error terms
The assumption of white noise errors in the AR(p) model may need to be assessed with more general priors allowing for outlier measurements. However, interpretations of outlying points in time series are made cautiously. Outliers at time t may be clearly aberrant, and either excluded or replaced by interpolated values taking account of surrounding values (Diggle, 1990). On the other hand, especially in economic time series, they may reflect aspects of economic behaviour which should be included in the specification (Thomas, 1997).
Some fairly conventional approaches are for a Normal mixture distribution or Student t errors to replace the usual Normal error assumption (Hoek et al., 1995;
West, 1996). Thus, let Dbe the small probability of an outlier (e.g.D0:05), and let the binary indicator
Jt Bernoulli(D)
govern whether the observationtis an outlier. Then one alternative (West , 1996) to the Normal errors AR(1) model in (5.1) is
yt mrytÿ1ut
whereutN(0,Kts2), and where random or fixed effect parametersKt>1 inflate the variance whenJt1. This is known as an innovation outlier model and may be written as
ut(1ÿD)N(0,s2)DN(0,Ks2) (5:10) If the Studenttis used as an outlier model for the innovations then the most appropriate option is the scale mixture form ± this includes weightswt averaging 1 which scale the precision ± so low weights (e.g. under 0.5) indicate possible outliers. In models with autocorrelated errors, such as
ytabxtet
etÿg1etÿ1ut the innovation outlier model would apply to theut.
176 MODELS FOR TIME SERIES
One may also define additive outliers corresponding to shifts in the observation series that may not occur for all time points. So, following Barnettet al. (1996) one might define a model
yt abxtotet etÿg1etÿ1ut
withotN(0,K1ts2), whereK1tis either 0 or positive (corresponding to times when an additive outlier does or does not occur), andut N(0,K2ts2), whereK2t is either 1 or greater than 1. One might then model the two outliers jointly, for instance via a discrete set of possible values forKt{K1t,K2t}. Barnettet al. illustrate this with a prior forKt
consisting of (0, 1), (3, 1), (10, 1), (0, 3), (0, 10) with selection among them based on a multinomial rather than binary indicatorJt, but with prior probabilities possibly biased towards the null option (0, 1). For example, prior probabilities on the just named options might be (0.9, 0.025, 0.025, 0.025, 0.025).
Another approach to additive outliers is developed by McCulloch and Tsay (1994).
They consider first a random level-shift autoregressive (RLAR) model ytmtet
mtmtÿ1dtZt
etg1etÿ1g2etÿ2. . .ut
wheredtis Bernoulli with probabilityDand governs the chance of a level shift at timet, the termsZt N(0,j2) describe the shifts, theetare autoregressive errors and the white noise errorsutare N(0,s2). The shift variancej2is taken as a large multiple (e.g. 10, or 100) times the white noise variance s2. The probability of a shift D is beta with parameters favouring low probabilities, for instanceDBeta(5, 95). The above model may be re-expressed as
ytmtg1(ytÿ1ÿmtÿ1)g2(ytÿ2ÿmtÿ2): :ut
mtmtÿ1dtZt
McCulloch and Tsay also propose a specialised additive outlier model, namely yt otet
ot dtZt
et g1etÿ1g2etÿ2. . .ut
which is the same as the RLAR model except that the levelmt(ot) no longer depends upon its previous value. This model can be re-expressed as
yt otg1(ytÿ1ÿotÿ1)g2(ytÿ2ÿotÿ2): :ut
ot dtZt
Another version of this model (Martin and Yohai, 1986) has yt (1ÿdt)etdtZt withet again autoregressive.
Priors consistent with stationarity and invertibility
Prior assumptions regarding stationarity or stationarity (and invertibility or non-invertibility) may interrelate to other aspects of time series model specification. Thus,
specifying priors for a full likelihood involving all observations may be more straight-forward for a stationary model. Consider the AR(1) model
yt mrytÿ1ut
Then for a stationary process withr2[ÿ1, 1], and with exchangeable errorsut with mean 0 and variances2, the first observationy1 has meanmand conditional variance
s2=(1ÿr2) (5:11)
rather thans2. This analytic form corresponds to assuming an infinite history for the process. Forp>2, a matrix generalisation of (5.11) is involved. For instance forp2 with lag coefficients {r1,r2}, the equivalent of (5.11) is a bivariate Normal for Y1{y1,y2} with covariance matrixs2S, where
RSR0K1(2)K1(2)0