Comparing Forecast Accuracy of Di¤erent Models for
Prices of Metal Commodities
João Victor Issler (FGV) and Claudia F. Rodrigues (VALE)
Stylized Facts on Forecasts and Forecast Combinations
Interest in forecasting yt, stationary and ergodic, using information up
to h periods prior to t – where h is treated as …xed. Risk function is MSE. Then, optimal forecast (Min. MSE) is:
Et h(yt),
h step ahead forecast error = yt Et h(yt).
Hendry and Clements (2002): Let fi ,th be the i -th h-step-ahead forecast of yt, i=1, 2, . . . , N. Then, N1
N
∑
i =1
fi ,th performs very well compared to individual forecasts fi ,th.
This suggests that fi ,th cannot approximateEt h(yt)very well, since
Stylized Facts on Forecasts and Forecast Combinations
Forecast combination works from a risk diversi…cation point-of-view (Bates and Granger, 1969, and Timmermann, 2006): if the number of
forecasts in the combination is large(N !∞), the idiosyncratic
component of forecast errors is wiped out due to the law of large numbers.
However, there is the Forecast Combination Puzzle: consider
N
∑
i=1
ωifi ,th, where jωij <∞ and N
∑
i=1
ωi =1. In practice, equal weights
ωi =1/N outperform “optimal weights” designed to outperform it in
Issler and Lima (JoE, 2009)
Work within a panel-data framework, where N, T !∞, with
sequential asymptotics: …rst T !∞ with N …xed. Then, N!∞,
written as (T , N !∞)seq.
Propose the use of equal weights combination (1/N)coupled with an
average bias correction term (BCAF): N1
N
∑
i=1
fi ,th B.b
Propose a new test for the need to do bias correction: H0 : B =0.
Show that there is no Forecast Combination Puzzle in large samples: for N, T large “optimal weights” have the same limiting MSE of the BCAF.
The bad performance of estimated “optimal weights” is then linked to the curse of dimensionality.
Issler and Lima (JoE, 2009)
They decompose fi ,th, as follows – where h is treated as …xed:
fi ,th =Et h(yt) +ki+εi ,t,
We can always write:
yt = Et h(yt) +ζt, with Et h(ζt) =0.
Then,
fi ,th = yt ζt +ki+εi ,t, or,
fi ,th = yt+ki+ηt +εi ,t, where, ηt = ζt, or
fi ,th = yt+µi ,t , i =1, 2, . . . , N, with µi ,t =ki +ηt+εi ,t
The error µi ,t has a two-way decomposition (Wallace and Hussain
(1969), Amemiya (1971), Fuller and Battese (1974)) with a long tradition in the econometrics literature. It depends on h, but this is omitted for notational convenience.
Issler and Lima (JoE, 2009) Assumptions
Time framework: E 1________T1 R ________T2 P _______T E = T1 =κ1 T , R = T2 T1 =κ2 T , P = T T2 =κ3 T .Assumption 1 ki, εi ,t and ηt are independent of each other for all i and t.
Assumption 2 ki is an identically distributed random variable in the
cross-sectional dimension, but not necessarily independent,
Issler and Lima (JoE, 2009) Assumptions
Assumption 3 The aggregate shock ηt is a stationary and ergodic MA
process of order at most h 1, with zero mean and variance
σ2η <∞.
Assumption 4: Let εt = (ε1,t, ε2,t, ... εN ,t)0 be a N 1 vector stacking the
errors εi ,t associated with all possible forecasts, where
E(εi ,t) =0 for all i and t. Then, the vector process fεtgis
assumed to be covariance-stationary and ergodic for the …rst and second moments, uniformly on N. Further, de…ning as
ξi ,t =εi ,t Et 1(εi ,t), the innovation of εi ,t, we assume
that lim N!∞ 1 N2 N
∑
i=1 N∑
j=1 E ξi ,tξj ,t =0. (2)Issler and Lima (JoE, 2009) Assumption for Nested Models
Continuous of N models (i =1, .., N) split into M classes (or blocks),
each with m nested models:
N = mM. Let,
M = N1 d, and m=Nd, where 0 d 1.
1 d =0, all models are non-nested;
2 d =1, all models are nested and;
3 0<d <1 gives rise to N1 d blocks of nested models, all with size
Nd.
Notice that d is a choice parameter from the point-of-view of the researcher combining forecasts.
Issler and Lima (JoE, 2009) Assumption for Nested Models
Partition matrix E ξi ,tξj ,t into blocks: M main-diagonal blocks, each
with m2=N2d elements. M2 M o¤-diagonal blocks. Index the class by
r =1, .., M, and models within class by s =1, .., m.
Within each block r , we assume that:
0 lim m!∞ 1 m2 m
∑
k=1 m∑
s=1 E ξr ,k ,tξr ,s ,t = lim N!∞ 1 N2d Nd∑
k=1 Nd∑
s=1 E ξr ,k ,tξr ,s ,t < ∞,being zero when the smallest nested model is correctly speci…ed.
Across any two blocks r and l , r 6=l , we assume that:
lim m!∞ 1 m2 m
∑
k=1 m∑
s=1 E ξr ,k ,tξl ,s ,t =Nlim !∞ 1 N2d Nd∑
k=1 Nd∑
s=1 E ξr ,k ,tξl ,s ,t =0.Here, the assumption in the previous page will still hold in the
Issler and Lima (JoE, 2009) Main Results
If Assumptions 1-4 hold, the following are consistent estimators of ki, B,
ηt, and εi ,t, respectively: bki = 1 R∑ T2 t=T1+1f h i ,t 1 R ∑ T2 t=T1+1yt, plim T!∞ bki ki =0, b B = 1 N∑ N i=1bki, plim (T ,N!∞)seq b B B =0, bηt = 1 N N
∑
i=1 fi ,th Bb yt, plim (T ,N!∞)seq (bηt ηt) =0, bεi ,t = fi ,th yt bki bηt, plim (T ,N!∞)seq (bεi ,t εi ,t) =0.Issler and Lima (JoE, 2009) Main Results
If Assumptions 1-4 hold, the feasible bias-corrected average forecast
1 N N
∑
i=1 fi ,th B obeys:b plim (T ,N!∞)seq 1 N N∑
i=1 fi ,th Bb ! =yt+ηt =Et h(yt),and has a mean-squared error as follows:
E " plim (T ,N!∞)seq 1 N N
∑
i=1 fi ,th Bb ! yt #2 =σ2η.Issler and Lima (JoE, 2009) Main Results
Consider the sequence of deterministic weights fωigNi=1, such that
jωij 6=0, ωi =O N 1 uniformly, with
N
∑
i=1
ωi =1 and lim
N!∞ N
∑
i=1
ωi =1.
Then, under Assumptions 1-4:
E " plim (T ,N!∞)seq N
∑
i=1 ωifi ,th N∑
i=1 ωibki ! yt #2 =σ2η.Therefore it is an optimal forecasting device as well.
For optimal population weights there is no Forecast Combination Puzzle.
Thus, the Forecast Combination Puzzle must be a consequence of the inability to estimate consistently the optimal population weights. This happens when R is small relative to N.
Issler and Lima (JoE, 2009) Main Results
The optimality results above are based on
fi ,th =Et h(yt) +ki+εi ,t,
where the bias ki is additive. If the bias is multiplicative as well as
additive, i.e.,
fi ,th =βiEt h(yt) +ki +εi ,t,
where βi 6=1 and βi β, σ2β , the BCAF is no longer optimal if β 6=1.
Optimality can be restored if the BCAF is slightly modi…ed to be 1 N N ∑ i=1 fi ,t bβ bki bβ ! ,
Issler and Lima (JoE, 2009) Main Results
Under the null hypothesis H0 : B =0, the test statistic:
bt= pBb b V d ! (T ,N!∞)seq N (0, 1),
where bV is a consistent estimator of the asymptotic variance of
B = N1 N
∑
i=1 ki. bV is estimated using a cross-section analog of the Newey-West estimator due to Conley (1999), where a natural order in the cross-sectional dimension requires matching spatial dependence to a metric of economic distance.
If B =0, the average forecast N1
N
∑
i=1
fi ,th is an optimal forecasting device.
Issler and Lima (JoE, 2009) Monte-Carlo
The DGP is a stationary AR(1)process:
yt =α0+α1yt 1+ξt, with ξt i.i.d.N (0, 1), α0=0, and α1 =0.5,
One-step-ahead forecasts are generated as:
fi ,t =bα0+bα1yt 1+ki+εi ,t,
and the bias is generated as:
ki = βki 1+ui,
where ui i.i.d.U(a, b), with β=0.5.
The error εi ,t is drawn from a multivariate Normal with zero mean and
variance-covariance matrix Σ= (σij), which has zero covariance imposed
Issler and Lima (JoE, 2009) Monte-Carlo Results
R =50, B =0.5, σ2ξ =σ2η =1
Bias MSE
BCAF Average Weighted BCAF Average Weighted
N =10 mean 0.000 0.391 -0.001 1.561 1.697 1.916 N =20 mean 0.000 0.440 -0.002 1.286 1.466 2.128 N =40 mean 0.000 0.465 0.000 1.147 1.351 6.094
Issler and Lima (JoE, 2009) Monte-Carlo Results
R =50, B =0, σ2ξ =σ2η =1
Bias MSE
BCAF Average Weighted BCAF Average Weighted
N=10 mean 0.000 0.000 -0.001 1.561 1.547 1.916 N=20 mean 0.000 0.000 -0.002 1.286 1.272 2.128 N=40 mean -0.002 0.000 0.000 1.147 1.133 6.094
Issler and Lima (JoE, 2009) Monte-Carlo Results
N =10, R =500, 1, 000, B =0.5, σ2ξ =σ2η =1
Bias MSE
BCAF Average Weighted BCAF Average Weighted
N =10, R =500
mean 0.000 0.391 0.000 1.532 1.697 1.559
N =10, R =1, 000
Issler and Lima (JoE, 2009) Monte-Carlo Results
N =40, R =2, 000, 4, 000, B =0.5, σ2ξ =σ2η =1
Bias MSE
BCAF Average Weighted BCAF Average Weighted
N =40, R =2, 000
mean 0.000 0.466 0.000 1.127 1.355 1.149
N =40, R =4, 000