Discounted Least Squares - GENERALIZED AND WEIGHTED LEAST SQUARES

Regression Analysis and Forecasting

3.7 GENERALIZED AND WEIGHTED LEAST SQUARES

3.7.3 Discounted Least Squares

GENERALIZED AND WEIGHTED LEAST SQUARES 119 16

• • •

iii _:I

• • •

•

"C '(ij

•

a: Q) _Q) ₈

• •

0 '5 ^U) 6

• • •

..c

•

< 4

• •

• ^•

• • • •

• • •

20 30 40 50 60

Weeks

FIGURE 3.4 Scatter plot of absolute residuals versus weeks.

The weights would be equal to the inverse of the square of the fitted value for each observation. These weights are shown in Table 3.12. Using these weights to fit a new regression model to strength using weighted least squares results in

y =

27.545

+

0.32383x

Note that the weighted least squares model does not differ very much from the ordinary least squares model. Because the parameter estimates didn't change very much, this is an indication that it is not necessary to iteratively reestimate the standard deviation

model and obtain new weights. •

120 REGRESSION ANALYSIS AND FORECASTING

describes the relationship between a response variable y that varies cyclically or periodically with time where the cyclic variation is modeled as a simple sine wave.

A very general model for these types of situations could be written as

(3.75) where the predictors _{x 1}(t ). x 2(t ) ... xk(t) are mathematical functions of time. t. In these types of models it is often logical to believe that older observations are of less value in predicting the future observations at periods T

+

1. T

+

2 ... than are the observations that are close to the current time period. T. In other words, if you want to predict the value of y at time T

+

1 given that you are at the end of time period T (or

.YT+

1(T)), it is logical to assume that the more recent observations such as YT, YT _1, and YT -2 carry much more useful information than do older observations such as YT-20· Therefore it seems reasonable to weight the observations in the regression model so that recent observations are weighted more heavily than older observations. A very useful variation of weighted least squares, called discounted least squares, can be used to do this. Discounted least squares also leads to a rel- atively simple way to update the estimates of the model parameters after each new observation in the time series.

Suppose that the model for observation y₁is given by Eq. (3.75):

=

x'(t)l3, t = 1. 2 ... T

where x'(t) = _{[x 1}(t ). x 2(t ) •... , x p(t )] and 13' = [/)1• /)2 •..•• /Jrl· This model could have an intercept term, in which case x1(t) = I. In matrix form. Eq. (3.75) is

y = X(T)I3

+

^E ^(3.76)

where y is a T x I vector of the observations, 13 is a p x vector of the model parameters, E is a T x I vector of the errors, and X(T) is the T x p matrix

Xt(l) X2(1) Xt (2) X2(2)

X(T)

=

Note that the tth row of X(T) contains the values of the predictor variables that correspond to the tth observation of the response, Yr.

We will estimate the parameters in Eq. (3.76) using weighted least squares. How- ever, we are going to choose the weights so that they decrease in magnitude with time. Specifically, let the weight for observation Yr _ j be (}J. where 0 < (} < I. We

GENERALIZED AND WEIGHTED LEAST SQUARES 121 are also going to shift the origin of time with each new observation so that T is the current time period. Therefore the WLS criterion is

T-l

L =

L

^[YT-j-^/3J(T)XJ(-^j)

^{+ ... +}

^f3r(T)xk(-

^J)]

J=O T-l

L[Y7-J

-x'(-j)j3(T)]2

j=O

(3.77)

where j3(T) indicates that the vector of regression coefficients is estimated at the end of time period T, and x(-j) indicates that the predictor variables, which are just mathematical functions of time, are evaluated at - j. This is just WLS with a T x T diagonal weight matrix

eT-l

₀ ₀ ₀

eT-2

₀ ₀

W=

e

0 0 0

By analogy with Eq. (3.70), the WLS normal equations are

where

X(T)'WX(T)~(T)

=

X(T)'Wy

G(T)~(T)

=

^g(T)

G(T)

=

X(T)'WX(T) g(T) = X(T)'Wy

The solution to the WLS normal equations is

~(T) is called the discounted least squares estimator of j3.

(3.78)

(3.79)

(3.80)

In many important applications, the discounted least squares estimator can be simplified considerably. Assume that the predictor variables x;(t) in the model are

122 REGRESSION ANALYSIS AND FORECASTING

functions of time that have been chosen so that their values at time period t

+

I are linear combinations of their values at the previous time period. That is.

(3.81)

In matrix form,

x(t +I)= Lx(t) (3.82)

where Lis the p x p matrix of the constants LiJ in Eq. (3.81 ). The transition property in Eq. (3.81) holds for polynomial, trigonometric, and certain exponential functions of time. This transition relationship implies that

x(t)

=

^Vx(O)

Consider the matrix G(T) in the normal equations (3.78). We can write

T-1

G(T) =

L

^eix(-^j)x(-^j)'

)=0

= G( T - 1)

+ e

^{T -I}^x(-j )x(-j )'

(3.83)

If the predictor variables x; (t) in the model are polynomial, trigonometric, or certain exponential functions of time, the matrix G(T) approaches a steady-state limiting value G,where

G= 'L_eix(-j)x(-j)' )=0

(3.84)

Consequently, the inverse of G would only need to be computed once. The right-hand side of the normal equations can also be simplified. We can write

T-1

g(T)

=

'L_eJYT-jX(-j) )=0

T-1

=

YrX(0)+ 'L_e¹vr-jX(-j)

J=l

T-1

=JTX(0)+e'L_eJ-IYT-JL-1

x(-j+ I)

J=l T -2

=

YrX(O) +eL-I

L

ekYT-1-kx( -k) k=O

= YrX(O)

+

^{8L -}¹g(T- I)

GENERALIZED AND WEIGHTED LEAST SQUARES

So the discounted least squares estimator can be written as

This can also be simplified. Note that

BCT)

= G-¹g(T)

where

and

=

G-¹lyrx(O) + 8L -¹g(T- 1)]

=

G-¹fyrx(0)+8L-¹Gj3(T- I)]

=

.vrG-¹x(O) + eG-¹L -¹Gf3(T- I)

BCT) =

hyr

+

ZB(T- I)

The right-hand side of Eq. (3.85) can still be simplified because

and letting k

=

j +I,

()()

L

^eJL^-IX(-j)x(- j)'(L')-¹L'

j=O cc

=

L.:ei[L-¹x(-j)][L-¹x(-j)']L'

j=O cc

= Lei

x(-j - 1 )x(-j - I )'L'

j=O

L -tG

=e-lL ekx(

-k)x( -k)'L'

k=l

e-

¹[G - x(O)x(O)']L'

123

(3.85)

(3.86)

(3.87)

124 REGRESSION ANALYSIS AND FORECASTING

Substituting for L -I G on the right-hand side of Eq. (3.87) results in Z

=

eG-Ie-¹[G-x(O)x(O)'JL'

=[I-G-¹x(O)x(O)']L'

=

L' - hx(O)L'

= L' - h[Lx(O)]'

= L'-hx(l)'

Now the vector of discounted least squares parameter estimates at the end of time period T in Eq. (3.85) is

~(T)

=

^hyT

+

Z~(T- I)

=

^hyT

+

[L' - hx(l )]~(T - I)

=

L'~(T- I)+ h[.YT-x(l)'~(T- ^I)]

But x(l )' ~(T - I)

=

5-T(T - I) is the forecast of YT computed at the end of the previous time period, T - I, so the discounted least squares vector of parameter estimates computed at the end of time period t is

~(T)

=

L'~(T- I)+ h(.YT- _\·T(T- I)]

=

L'~(T- I)+ he1(1)

(3.88)

The last line in Eq. (3.88) is an extremely important result: it states that in discounted least squares the vector of parameter estimates computed at the end of time period T can be computed as a simple linear combination of the estimates made at the end of the previous time period T - I and the one-step-ahead forecast error for the observation in period T. Note that there are really two things going on in estimating j3 by discounted least squares: the origin of time is being shifted to the end of the current period, and the estimates are being modified to reflect the forecast error in the current time period. The first and second terms on the right-hand side of Eq. (3.88) accomplish these objectives, respectively.

When discounted least squares estimation is started up, an initial estimate of the parameters is required at time period zero, say, ~(0). This could be found by a standard least squares (or WLS) analysis of historical data.

Because the origin of time is shifted to the end of the current time period, forecasting is easy with discounted least squares. The forecast of the observation at a future time period T

+

r, made at the end of time period T. is

S·T+r(T) = ~(T)'x(r)

L ^,B

^{j (}^T)x^{j (}^r)

j=l

(3.89)

GENERALIZED AND WEIGHTED LEAST SQUARES 125 Example 3.10 Discounted Least Squares and the Linear Trend Model

To illustrate the discounted least squares procedure, let's consider the linear trend model:

To write the parameter estimation equations in Eq. (3.88), we need the transition matrix L. For the linear trend model, this matrix is

[~ ~]

Therefore the parameter estimation equations are

~(T) = L'~(T- 1)

+

heT(l)

[ ^~o(T)] ⁼ [1

f31(T) 0

l][~o(T-l)]+[hl]e 1

l f31(T-l) h2 T()

~o(T)

=

~o(T- 1)

+

^~~(T-

I)+

^h,eT(l)

~1^(T)

=

~1^(T- ¹⁾

+

^h2eT(l)

The elements of the vector hare found from Eq. (3.86):

=

G-¹x(O)

The steady-state matrix G is found as follows:

T-1

G(T)

= L

e.ix(- j)x(- j)'

j=O

T-1 [ 1 ]

= 'L_eJ _. [

j=O .J

-j]

T-1 [ 1

'L_eJ _·

j=O .J

+/ -j]

(3.90)

126 REGRESSION ANALYSIS AND FORECASTING

j=O j=O

T-1 T-1

- Liej L:/ej

J=O j=O

[

) - gT 1-e

_eo-eTJ

1-e

eo-eT) ]

1-e

8(1 +8)(1-AT) (I - 8)'

The steady-state value ofG(T) is found by taking the limit as T - 4 x, which results

The inverse of G is

Therefore, the vector h is

=

lim G(T)

T---+x

r

^-1

I-e ^~/

(j ]

1-A

eo +

(I - fJ).1

- I _ [

1-e~ <I-eJ:]

G - ( I -A)-

( I-A)~--

GENERALIZED AND WEIGHTED LEAST SQUARES 127 Substituting the elements of the vector h into Eq. (3.90) we obtain the parameter estimating equations for the linear trend model as

~o(T)

=

/3o(T- l)

+

~~(T- 1)

+

( l -8²)er(l)

~1^{(T) =}^~1^(T- ^l)

+

( l -

e?er0)

Inspection of these equations illustrates the twin aspects of discounted least squares;

shifting the origin of time, and updating the parameter estimates. In the first equation, the updated intercept at time T consists of the old intercept plus the old slope (this shifts the origin of time to the end of the current period T), plus a fraction of the current forecast error (this revises or updates the estimate of the intercept). The second equation revises the slope estimate by adding a fraction of the current period forecast error to the previous estimate of the slope.

To illustrate the computations, suppose that we are forecasting a time series with a linear trend and we have initial estimates of the slope and intercept at time t = 0 as

~0⁽⁰⁾

=

50 and /3₁(0) = 1.5

These estimates could have been obtained by regression analysis of historical data.

Assume that 8 = 0.9, so that 1 - 8 = I - (0.9? = 0.19 and (I - 8)²

=

^{(I -}

0.9)²

=

0.01. The forecast for time period t = l, made at the end of time period t

=

0, is computed from Eq. (3.89):

Y1 (0)

=

B(O)'x( I)

=

^/3o(0)

+

~ ¹⁽⁰⁾

=50+ 1.5

= 51.5

Suppose that the actual observation in time period I is y1 = 52. The forecast error in time period 1 is

e1(1)

=

Yl-YJ(O)

=52- 51.5

= 0.5

The updated estimates of the model parameter computed at the end of time period I are now

/3o(l)

=

~o(O) + ~1(0) + 0.19e1(0)

= 50+ 1.5 + 0.19(0.5)

= 51.60 weight- Usado

em séries AR

128 and

REGRESSION ANALYSIS AND FORECASTING

~~(I)= ~~(0)

+

^O.Oie1(0)

= 1.5

+

0.01(0.5)

= 1.55

The origin of time is now T

=

I. Therefore the forecast for time period 2 made at the end of period I is

S·2(1) = ~o(l)

+

~~(!)

= 51.6

+

1.55

= 53.15

If the observation in period 2 is Y2

=

55, we would update the parameter estimates exactly as we did at the end of time period I. First, calculate the forecast error:

e2 ( I )

=

Y2 -

S·2 (

I )

=55- 53.15

= 1.85

Second, revise the estimates of the model parameters:

and

~o(2) = ~o(l)

+

~ 1 (I)

+

0.!9e2(!)

= 51.6 + 1.55 + 0.19(1.85)

= 53.50

~~(2)

=

~~(!)

+

O.Oie2(1)

= 1.55+0.01(1.85)

= 1.57

The forecast for period 3, made at the end of period 2, is S'J(2) = ~o(2)

+

~ ¹⁽²⁾

= 53.50 + 1.57

= 55.07

Suppose that a forecast at a longer lead time than one period is required. If a forecast for time period 5 is required at the end of time period 2. then because the forecast

GENERALIZED AND WEIGHTED LEAST SQUARES

lead time is r = 5 - 2 = 3, the desired forecast is .Ys(2)

=

~o(2)

+

~ ¹⁽²⁾³

=

53.50

+

1.57(3)

=

58.21

129

In general, the forecast for any lead time r, computed at the current origin of time (the end of time period 2), is

.Ys(2) = ~o(2)

+

~ ¹^(2)r

= 53.50

+

1.57r

•

When the discounted least squares procedure is applied to a linear trend model as in Example 3. 9, the resulting forecasts are equivalent to the forecasts produced by a method called double exponential smoothing. Exponential smoothing is a popular and very useful forecasting technique and will be discussed in detail in Chapter 4.

Discounted least squares can be applied to more complex models. For example, suppose that the model is a polynomial of degree ^k.The transition matrix for this model is a square (k

+

I) x (k

+

I) matrix in which the diagonal elements are unity, the elements immediately to the left of the diagonal are also unity, and all other elements are zero. In this polynomial, the term of degree r is written as

f3r ( t ) = f3r __ t_! _ _ r (t- r)!r!

In the next example we illustrate discounted least squares for a simple seasonal model.

Example 3.11 A Simple Seasonal Model

Suppose that a time series can be modeled as a linear trend with a superimposed sine wave to represent a seasonal pattern that is observed monthly. The model is a variation of the one shown in Eq. (3.3):

(3.91)

Since this model represents monthly data, d

=

12, Eq. (3.91) becomes

2n 2n

= f3o +

f31t

+

f32 sin - t

+

/33 cos-t+ E

0 12 12 (3.92)

Isso é similar a modelos autoregressivos e será visto adiante.

Ver Statistica Model

130 REGRESSION ANALYSIS AND FORECASTING

The transition matrix L for this model, which contains a mixture of polynomial and trigonometric terms, is

0 0 0

0 0

L=

₀ ₀ _cos-^2rr _{S i n -}^2rr

12 12

2rr 2rr

0 0 ^{- S i n -} cos-

12 12

Note that L has a block diagonal structure, with the first block containing the elements for the polynomial portion of the model and the second block containing the elements for the trigonometric terms, and the remaining elements of the matrix are zero. The parameter estimation equations for this model are

~(T) = L'~(T- I)+ herO)

0 0 0

0 0

f3l(T) 2rr . 2rr /JI(T-1) h2

[ ^~o(T)]

/J2(T) - 0 0 cos-

[

~"(

^{T -}

I] [

^h,

l

S i n -12 ~

2

^(T-l) ⁺ ^h1 ^er(l)

/33(T)

2rr 2rr f33(T- I) h~

0 0 - s i n - cos-

12 12

/Jo(T)

=

^/Jo(T- I)+ /3 1(T- I)+ h1er(l) /J1(T) = /J1(T- I)+ h2er(l)

, 2n , 2n ,

f32(T) =cos 12f32(T- l)- sin l2f33(T- I)+ h,er(l)

, 2n , 2n ,

f33(T) = sin

l2

{32(T - I)+ cos

l2

{33 ( T - I)+ h~er( I)

and since 2rr / 12 = 30c, these equations become

/>o(T)

=

^/Jo(T- I)+ />1(T- I)+ h1er(l) /J1(T)

=

^/J1(T- I)+ h2er(l)

/J2(T)

=

0.866/32(T- I)-O.S/33(T- I)+ h.1er( I) /33(T) = O.S/32(T- I)+ 0.866/33(T- I)+ h_.er( I)

GENERALIZED AND WEIGHTED LEAST SQUARES The steady-state G matrix for this model is

- I:

00 ^ek^sin^wk

k=O

L

oc ^kek^sin^wk k=O

z=

00 ^ek^sin^wk^sin^wk

k=O

I:

00 e" coswk k=O

- L

00 ^k()k^cos^wk

k=O

-z=

^ri

^sin^wk^cos^(J)k

k=O

z=

^rl

^cos^wk^cos^wk k=O

131

where we have let ^w= 2rr / I2. Because G is symmetric, we only need to show the upper half of the matrix. It turns out that there are closed-form expressions for all of the entries in G. We will evaluate these expressions ^for()= 0.9. This gives the following:

oc I I

L ^tl ⁼

^1-

^e ⁼

^{I - o.9}

⁼

¹⁰

k=O

L

k=O ^eo^k()^k

⁼

(1-())2

^e ⁼

(J-0.9)2 ^0.9

⁼

⁹⁰

~ " k ()(1+8) 0.9(1+0.9) 0

~k~e

= = =

171

k=O (I - ())3 (1 - 0.9)3 for the polynomial terms and

:-x__,

L

^8ksin ^wk

k=O

L

X) ^8"^cos^wk

k=O

8 sinw I - 28 cos w

+

8²

1-8 cosw I - 28 cos w

+

8²

(0.9)0.5

----,--:--:--,---::c

=

I . 79 I - 2(0.9)0.866 + (0.9)2

I - (0.9)0.866

=

0_ 8824 I - 2(0.9)0.866 + (0.9)²

x B(l-8²)sinw 0.9[1-(0.9)²]0.5

'"'kBk sinwk = = =I 368

f-:o o -

²⁸cos w + 8²⁾²

r

1 - 2(0.9)0.866 + (0.9)²^J² ^·

~ ^k 28²- 8( I + 8²) cos w 2(0.W - 0.9[1 + (0.9)"]0.866

L..k8 coswk = = 0 o = 3.3486

k=O (l-28cosw+8^{2 ) 2} [1-2(0.9)0.866+(0.9)~]-

~ _{L.. ()}" . . I [ I - () cos(2w) I -

e

cos(O)

J

Sill wk Sill wk = - - - - - - : :

k=O 2 I - 2() cos(2w) + ()2 I - 2() cos(O) + ()2

I [ I - 0.9(0.5) 1 - 0.9(1) ]

2 1 - 2(0.9)0.5 + (0.9)2 I - 2(0.9)(1) + (0.9)²

= 4.7528

132 REGRESSION ANALYSIS AND FORECASTING

ek .

k k I [

e

sin(2w) () sin(O) ]

~ Sin W COS W = - , -

k=O 2 1 - 2() cos(2w)

+ e-

1 - 2() cos(O)

+

⁽⁾²

1 [ 0.9(0.866) 0.9(0) ]

= 2

I - 2(0.9)0.5

+

(0.9)²

+

I - 2(0.9)1

+

(0.9)²

= 0.4284

~ ^k I [ I -

e

cos(2w) I -

e

cos(O) ]

~()~~~~=-

+ ,

k=O 2 1 - 2() cos(2w)

+

8² 1 - 2e cos(O)

+

1 [ 1 - 0.9(0.5) I - 0.9(1) ]

= 2

I - 2(0.9)0.5

+

(0.9)2

+

I - 2(0.9)(1)

+

(0.9)²

=

5.3022

for the trignometric terms. Therefore the G matrix is

and G-¹is

G_1 = 0.01987

l

^-0.02264^0.214401^0.075545

-90 -1.79 1. 740 1.368 4.7528

0.8824]

-3.3486 -0.4284 5.3022

0.01987 0.075545 0.()() 1138 0.003737 0.003737 0.238595 -0.00081 0.009066

-0.02264]

-0.00081 0.009066 0.192591

where we have shown the entire matrix. The h vector is

h = G-¹x(O)

l

^0.214401^0.01987

=

0.075545 -0.02264

l

^0.191762]^O.Ql0179

- 0.084611 0.169953

0.01987 0.075545 0.001138 0.003737 0.003737 0.238595 -0.00081 0.009066

-0.02264]

ll ]

-0.00081 0 0. 009066 0 0.192591 I

REGRESSION MODELS FOR GENERAL TIME SERIES DATA

Therefore the discounted least squares parameter estimation equations are fJo(T) = tJo(T -1)+ {J₁(T- 1)+0.191762er(l)

{JI(T) = {JI(T- 1)

+

0.010179er(l)

A 2n ^A 2n ^A

fh(T) =cos Ufh(T- 1)- sin Ufh(T- 1)

+

0.0846ller(l)

133

A 2n ^A 2n ^A

fh(T) =sin Ufh(T- 1) +cos Tifh(T- 1) + 0.169953er0) •

3.8 REGRESSION MODELS FOR GENERAL TIME SERIES DATA

No documento Introduction to Time Series Analysis and Forecasting (páginas 132-146)

Discounted Least Squares

Regression Analysis and Forecasting

3.7 GENERALIZED AND WEIGHTED LEAST SQUARES

3.7.3 Discounted Least Squares

•

•

• •

• • •

•

•

• •

• • •

•

• •

• •

• •

• • • •

• • •

y =

+

+

+

+

.YT+

=

+

=

L

+ ... +

J)]

L[Y7-J

eT-l

eT-2

W=

e

=

=

=

+

=

L

+ e

=

=

=

L

+

BCT)

=

=

=

BCT) =

+

=

L

=

= Lei

=e-lL ekx(

e-

=

=

=

+

=

+

=

=

=

=

+

L ,B

[~ ~]

+

[ ~o(T)] = [1

l][~o(T-l)]+[hl]e 1

=

+

I)+

=

+

• ^•

^{+ ... +}

^J)]

L ^,B

[ ^~o(T)] ⁼ [1

I-e ^~/

[ ^~o(T)]