Comparing forecast accuracy of different models for prices of metal commodities

(1)

Comparing Forecast Accuracy of Di¤erent Models for

Prices of Metal Commodities

João Victor Issler (FGV) and Claudia F. Rodrigues (VALE)

(2)

Stylized Facts on Forecasts and Forecast Combinations

Interest in forecasting yt, stationary and ergodic, using information up

to h periods prior to t – where h is treated as …xed. Risk function is MSE. Then, optimal forecast (Min. MSE) is:

Et h(yt),

h step ahead forecast error = yt Et h(yt).

Hendry and Clements (2002): Let f_{i ,t}h be the i -th h-step-ahead forecast of yt, i=1, 2, . . . , N. Then, _N1

N

∑

i =1

f_{i ,t}h performs very well compared to individual forecasts f_{i ,t}h.

This suggests that f_{i ,t}h cannot approximateEt h(yt)very well, since

(3)

Stylized Facts on Forecasts and Forecast Combinations

Forecast combination works from a risk diversi…cation point-of-view (Bates and Granger, 1969, and Timmermann, 2006): if the number of

forecasts in the combination is large(N _!∞), the idiosyncratic

component of forecast errors is wiped out due to the law of large numbers.

However, there is the Forecast Combination Puzzle: consider

N

∑

i=1

ωif_{i ,t}h, where jωij <∞ and N

∑

i=1

ωi =1. In practice, equal weights

ωi =1/N outperform “optimal weights” designed to outperform it in

(4)

Issler and Lima (JoE, 2009)

Work within a panel-data framework, where N, T _!∞, with

sequential asymptotics: …rst T _!∞ with N …xed. Then, N_!∞,

written as (T , N _!∞)_seq.

Propose the use of equal weights combination (1/N)coupled with an

average bias correction term (BCAF): _N1

N

∑

i=1

f_{i ,t}h B.b

Propose a new test for the need to do bias correction: H0 : B =0.

Show that there is no Forecast Combination Puzzle in large samples: for N, T large “optimal weights” have the same limiting MSE of the BCAF.

The bad performance of estimated “optimal weights” is then linked to the curse of dimensionality.

(5)

Issler and Lima (JoE, 2009)

They decompose f_{i ,t}h, as follows – where h is treated as …xed:

f_{i ,t}h =Et h(yt) +ki+εi ,t,

We can always write:

yt = Et h(yt) +ζ_t, with Et h(ζ_t) =0.

Then,

f_{i ,t}h = yt ζt +ki+εi ,t, or,

f_{i ,t}h = yt+ki+η_t +εi ,t, where, η_t = ζ_t, or

f_{i ,t}h = yt+µ_{i ,t ,} i =1, 2, . . . , N, with µ_{i ,t} =ki +η_t+εi ,t

The error µ_{i ,t} has a two-way decomposition (Wallace and Hussain

(1969), Amemiya (1971), Fuller and Battese (1974)) with a long tradition in the econometrics literature. It depends on h, but this is omitted for notational convenience.

(6)

Issler and Lima (JoE, 2009) Assumptions

Time framework: E 1________T1 R ________T2 P _______T E = T1 =κ1 T , R = T2 T1 =κ2 T , P = T T2 =κ3 T .

Assumption 1 ki, εi ,t and ηt are independent of each other for all i and t.

Assumption 2 ki is an identically distributed random variable in the

cross-sectional dimension, but not necessarily independent,

(7)

Issler and Lima (JoE, 2009) Assumptions

Assumption 3 The aggregate shock η_t is a stationary and ergodic MA

process of order at most h 1, with zero mean and variance

σ2_η <∞.

Assumption 4: Let εt = (ε1,t, ε2,t, ... εN ,t)0 be a N 1 vector stacking the

errors εi ,t associated with all possible forecasts, where

E(εi ,t) =0 for all i and t. Then, the vector process fεtgis

assumed to be covariance-stationary and ergodic for the …rst and second moments, uniformly on N. Further, de…ning as

ξ_{i ,t} =εi ,t Et 1(εi ,t), the innovation of εi ,t, we assume

that lim N!∞ 1 N2 N

∑

i=1 N

∑

j=1 E ξi ,tξj ,t =0. (2)

(8)

Issler and Lima (JoE, 2009) Assumption for Nested Models

Continuous of N models (i =1, .., N) split into M classes (or blocks),

each with m nested models:

N = mM. Let,

M = N1 d, and m=Nd, where 0 d 1.

1 d =0, all models are non-nested;

2 _d =_{1, all models are nested and;}

3 ₀<_d <_{1 gives rise to N}1 d _{blocks of nested models, all with size}

Nd.

Notice that d is a choice parameter from the point-of-view of the researcher combining forecasts.

(9)

Issler and Lima (JoE, 2009) Assumption for Nested Models

Partition matrix E ξ_{i ,t}ξ_{j ,t} into blocks: M main-diagonal blocks, each

with m2₌_N2d _{elements. M}2 _{M o¤-diagonal blocks. Index the class by}

r =1, .., M, and models within class by s =1, .., m.

Within each block r , we assume that:

0 lim m!∞ 1 m2 m

∑

k=1 m

∑

s=1 E ξr ,k ,tξr ,s ,t = lim N!∞ 1 N2d Nd

∑

k=1 Nd

∑

s=1 E ξr ,k ,tξr ,s ,t < ∞,

being zero when the smallest nested model is correctly speci…ed.

Across any two blocks r and l , r ₆₌l , we assume that:

lim m_!∞ 1 m2 m

∑

k=1 m

∑

s=1 E ξr ,k ,tξl ,s ,t =_Nlim !∞ 1 N2d Nd

∑

k=1 Nd

∑

s=1 E ξr ,k ,tξl ,s ,t =0.

Here, the assumption in the previous page will still hold in the

(10)

Issler and Lima (JoE, 2009) Main Results

If Assumptions 1-4 hold, the following are consistent estimators of ki, B,

η_t, and εi ,t, respectively: bki = 1 R∑ T2 t=T1+1f h i ,t 1 R ∑ T2 t=T1+1yt, plim T_!_∞ bki ki =0, b B = 1 N∑ N i=1bki, plim (T ,N_!∞)seq b B B =0, bηt = 1 N N

∑

i=1 f_{i ,t}h Bb yt, plim (T ,N_!∞)seq (bηt ηt) =0, bεi ,t = fi ,th yt bki bηt, plim (T ,N!∞)_seq (bεi ,t εi ,t) =0.

(11)

Issler and Lima (JoE, 2009) Main Results

If Assumptions 1-4 hold, the feasible bias-corrected average forecast

1 N N

∑

i=1 f_{i ,t}h B obeys:b plim (T ,N!∞)_seq 1 N N

∑

i=1 f_{i ,t}h Bb ! =yt+η_t =Et h(yt),

and has a mean-squared error as follows:

E " plim (T ,N_!∞)seq 1 N N

∑

i=1 f_{i ,t}h Bb ! yt #2 =σ2_η.

(12)

Issler and Lima (JoE, 2009) Main Results

Consider the sequence of deterministic weights _fωigN_i₌₁, such that

jωij 6=0, ωi =O N 1 uniformly, with

N

∑

i=1

ωi =1 and lim

N!∞ N

∑

i=1

ωi =1.

Then, under Assumptions 1-4:

E " plim (T ,N_!_∞)seq N

∑

i=1 ωifi ,th N

∑

i=1 ωibki ! yt #2 =σ2_η.

Therefore it is an optimal forecasting device as well.

For optimal population weights there is no Forecast Combination Puzzle.

Thus, the Forecast Combination Puzzle must be a consequence of the inability to estimate consistently the optimal population weights. This happens when R is small relative to N.

(13)

Issler and Lima (JoE, 2009) Main Results

The optimality results above are based on

f_{i ,t}h =Et h(yt) +ki+εi ,t,

where the bias ki is additive. If the bias is multiplicative as well as

additive, i.e.,

f_{i ,t}h =β_iEt h(yt) +ki +εi ,t,

where β_i 6=1 and β_i β, σ2_β , the BCAF is no longer optimal if β 6=1.

Optimality can be restored if the BCAF is slightly modi…ed to be 1 N N ∑ i=1 fi ,t bβ bki bβ ! ,

(14)

Issler and Lima (JoE, 2009) Main Results

Under the null hypothesis H0 : B =0, the test statistic:

bt= _pBb b V d ! (T ,N_!∞)seq N (0, 1),

where bV is a consistent estimator of the asymptotic variance of

B = _N1 N

∑

i=1 ki. b

V is estimated using a cross-section analog of the Newey-West estimator due to Conley (1999), where a natural order in the cross-sectional dimension requires matching spatial dependence to a metric of economic distance.

If B =0, the average forecast _N1

N

∑

i=1

f_{i ,t}h is an optimal forecasting device.

(15)

Issler and Lima (JoE, 2009) Monte-Carlo

The DGP is a stationary AR(1)process:

yt =α0+α1yt 1+ξt, with ξt i.i.d.N (0, 1), α0=0, and α1 =0.5,

One-step-ahead forecasts are generated as:

fi ,t =bα0+bα1yt 1+ki+εi ,t,

and the bias is generated as:

ki = βki 1+ui,

where ui i.i.d.U(a, b), with β=0.5.

The error εi ,t is drawn from a multivariate Normal with zero mean and

variance-covariance matrix Σ= (σij), which has zero covariance imposed

(16)

Issler and Lima (JoE, 2009) Monte-Carlo Results

R =50, B =0.5, σ2_ξ =σ2_η =1

Bias MSE

BCAF Average Weighted BCAF Average Weighted

N =10 mean 0.000 0.391 -0.001 1.561 1.697 1.916 N =20 mean 0.000 0.440 -0.002 1.286 1.466 2.128 N =40 mean 0.000 0.465 0.000 1.147 1.351 6.094

(17)

Issler and Lima (JoE, 2009) Monte-Carlo Results

R =50, B =0, σ2_ξ =σ2_η =1

Bias MSE

N=10 mean 0.000 0.000 -0.001 1.561 1.547 1.916 N=20 mean 0.000 0.000 -0.002 1.286 1.272 2.128 N=40 mean -0.002 0.000 0.000 1.147 1.133 6.094

(18)

Issler and Lima (JoE, 2009) Monte-Carlo Results

N =10, R =500, 1, 000, B =0.5, σ2_ξ =σ2_η =1

Bias MSE

N =10, R =500

mean 0.000 0.391 0.000 1.532 1.697 1.559

N =10, R =1, 000

(19)

Issler and Lima (JoE, 2009) Monte-Carlo Results

N =40, R =2, 000, 4, 000, B =0.5, σ2_ξ =σ2_η =1

Bias MSE

N =40, R =2, 000

mean 0.000 0.466 0.000 1.127 1.355 1.149

N =40, R =4, 000