LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS

Regression Analysis and Forecasting

3.2 LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS

We begin with the situation where the regression model is used with cross-section data. The model is given in Eq. (3.4). There are n > k observations on the response variable available, say, y 1, y2, ... , }'^{11 •}Along with each observed response y;, we will have an observation on each regressor or predictor variable and x;i denotes the ith observation or level of variable xi. The data will appear as in Table 3.1. We assume that the error term E in the model has expected value E (E) = 0 and variance Var (E) = a 2, and that the errors E;, i = I, 2, ... , n are uncorrelated random variables.

TABLE3.1 Cross-Section Data for Multiple Linear Regression

Observation Response, y x, X::

.r,

x,,

^.:rl~ ^Xik

2 Y2 ^X21 ^X2:.: ^_:r2~

ll )'n Xni Xn2 XnJ...

76 REGRESSION ANALYSI~ AND FORECASTING

The method ofleast squares chooses the model parameters (the f3 's) in Eq. (3.4) so that the sum of the squares of the errors, e;, is minimized. The least squares function is

n n

=

L c:J

=

L (y; - f3o- fJ,x;, - fJ2x;2- · · ·- f3kx'k )²

i=l i=l

(3.6)

This function is to be minimized with respect to f3o. {3_{1, .•••}f3k. Therefore the least squares estimators, say, fJo. fJ 1 , ... , fJ ^b must satisfy

=0 (3.7)

and

(3.8) Simplifying Eqs. (3.7) and (3.8) we obtain

II II fl fl

nfJo

+ fJ,

x;,

+fJ2 L x;2+ · · ·

+

fJk L X;k = LY; (3.9)

i=l i=l i=l 1=1

n n n n n

fJo

I:xil+fJ, l:::x?,+fJ2

LX;2x;,+ · · ·

+

fJk LX;kXil = LY;X;I

i=l i=l i=l i=l 1=1

(3.10)

n n n 11 n

fJo LX;k+fJ, Lx;,x;k

+

fJ2 Lx;2x;k+ · · ·

+

fJk l:::x;", = LY;XIk

i=l i=l i=l 1=1 i=l

These equations are called the least squares normal equations. Note that there are p = k

+

I normal equations, one for each of the unknown regression coefficients.

The solutions to the normal equations will be the least squares estimators ofthe model regression coefficients.

It is simpler to solve the normal equations if they are expressed in matrix notation.

We now give a matrix development of the normal equations that parallels the development of Eq. (3.1 0). The multiple linear regression model may be written in matrix notation as

y = Xj3

+

^£ ^(3.11)

Ver exemplo para Regressão Linear Simples

x1 tem duas eqs.

x1 e x2 tem tres eqs

LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 77 where

[y, l _h[

^XJI X12 · · · XJk

Y2 ^X21 X22 · · • Xzk

y= . ^'

Yn ^Xnl Xn2 · · · Xnk

In general, y is an (11 x I) vector of the observations, X is an (11 x p) matrix of the levels of the regressor variables,

13

is a ( p x I) vector of the regression coefficients, and £ is an (11 x I) vector of random errors. X is usually called the model matrix, because it is the original data table for the problem expanded to the form of the regression model that you desire to fit.

The vector of least squares estimators minimizes

L =

z::>;

⁼ ^{t:' t:}⁼(y- X(3)'(y- X(3)

i=l

We can expand the right-hand side of Land obtain

=

y'y- (3'X'y- y'X(3

+

(3'X'X(3

=

y'y- 2(3'X'y

+

(3'X'X(3

because (3'X'y is a (I xI) matrix, or a scalar, and its transpose ((3'X'y)'

=

y'X(3 is the same scalar. The least squares estimators must satisfy

- aLI =

-2X'y

+

2(X'X)(3 ~

=

0 a(3 ~

which simplifies to

(X'X)~

=

X'y (3.12)

In Eq. (3.12) X' X is a (p x p) symmetric matrix and X'y is a (p x I) column vector.

Equation (3.12) is just the matrix form of the least squares normal equations. It is identical to Eq. (3.10). To solve the normal equations, multiply both sides of Eq.

(3.12) by the inverse of X'X (we assume that this inverse exists). Thus the least squares estimator of ~ is

(3.13)

78 REGRESSION ANALYSIS AND FORECASTING

The fitted values of the response variable from the regression model are computed from

(3.14)

or in scalar notation,

(3.15)

The difference between the actual observation y; and the corre~ponding fitted value is the residual ^e;= y; - _\·;, i = I. 2 ... 11. The 11 residuals can be written as an

(11 x 1) vector denoted by

=

y =

y -X~ (3.16)

In addition to estimating the regression coefficients {30 • {31 •.•.• f3J.., it is also neces- sary to estimate the variance of the model errors. a". The estimator of this parameter involves the sum of squares of the residuals

We can show that £(SSE) = (11 - p)a², so the estimator of a" is the residual or mean square error

' 1 SSE

a - = - - (3.17)

11-p

The method of least squares is not the only way to estimate the parameters in a linear regression model, but it is widely used, and it results in estimates of the model parameters that have nice properties. If the model is correct (it has the right form and includes all of the relevant predictors), the least squares estimator ~ is an unbiased estimator of the model parameters

f3;

that is.

£(~) =

13

The variances and covariances of the estimators ~ are contained in a ( p x p) covari- ance matrix

(3.18)

The variances of the regression coefficients are on the main diagonal of this matrix and the covariances are on the off-diagonals.

BLUE - Best Linear Unbiased Estimator

LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 79 Example 3.1

A hospital is implementing a program to improve quality and productivity. As part of this program, the hospital is attempting to measure and evaluate patient satisfaction.

Table 3.2 contains some of the data that has been collected for a random sample of 25 recently discharged patients. The "severity" variable is an index that measures the severity of the patient's illness, measured on an increasing scale (i.e., more severe ill- nesses have higher values of the index), and the response satisfaction is also measured on an increasing scale, with larger values indicating greater satisfaction.

We will fit a multiple linear regression model to the patient satisfaction data. The model is

where y

=

patient satisfaction, x1

=

patient age, and x2

=

illness severity. To solve the least squares normal equations, we will need to set up the X'X matrix and the X'y

TABLE3.2 Patient Satisfaction Survey Data

Observation Age (xJ) Severity (x2 ) Satisfaction (y)

55 50 68

2 46 24 77

3 30 46 96

4 35 48 80

5 59 58 43

6 61 60 44

7 74 65 26

8 38 42 88

9 27 42 75

10 51 50 57

II 53 38 56

12 41 30 88

13 37 31 88

14 24 34 102

15 42 30 88

16 50 48 70

17 58 61 52

18 60 71 43

19 62 62 46

20 68 38 56

21 70 41 59

22 79 66 26

23 63 31 52

24 39 42 83

25 49 40 75

Exercício: Desenvolver pelo Matlab

80 REGRESSION ANALYSIS AND FORECASTING

vector. The model matrix X and observation vector y are

55 50 68

46 24 77

30 46 96

35 48 80

59 58 43

61 60 44

74 65 26

38 42 88

27 42 75

51 50 57

53 38 56

41 30 88

X= 37 31 _Y= 88

24 34 102

42 30 88

50 48 70

58 61 52

60 71 43

62 62 46

68 38 56

70 41 59

79 66 26

63 31 52

39 42 83

49 40 75

The X'X matrix and the X'y vector are

x'x~ Us

₅₀ ⁴⁶₂₄¹

and

. . . 1 l

['

... :;; ] :

1 46 24

50]

46 24 25 1271

~

^{[ 1271} ⁶⁹⁸⁸¹

. 1148 60814 49 40

l ]

l ^~~] [

^{1638 ]}

49 . = 76487

40 : 70426

1148]

60814 56790

LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 81 Using Eq. (3.13), we can find the least squares estimates of the parameters in the regression model as

~ = (X'X)-¹X'y

[ ~;71 !~~~I !b:~ 4]

^{- I [}

~:!:7]

1148 60814 56790 70426

[

0.699946097 -0.006128086 -0.007586982] [ 1638 ] -0.006128086 0.00026383 -0.000158646 76487 -0.007586982 -0.000158646 0.000340866 70426

[

143.4720118]

-1.031053414 -0.55603781 Therefore the regression model is

y

= 143.472- l.03lxl - 0.556x2

where x ₁=patient age and x2 = severity of illness, and we have reported the regres-

sion coefficients to three decimal places. •

Table 3.3 shows the output from the Minitab regression routine for the patient satisfaction data. Note that, in addition to the fitted regression model, Mini tab provides a list of the residuals computed from Eq. (3.16) along with other output that will provide information about the quality of the regression model. This output will be explained in subsequent sections, and we will frequently refer back to Table 3.3.

Example 3.2 Trend Adjustment

One way to forecast time series data that contains a linear trend is with a trend adjustment procedure. This involves fitting a model with a linear trend term in time, subtracting the fitted values from the original observations to obtain a set of residuals that are trend-free, then forecast the residuals, and compute the forecast by adding the forecast of the residual value(s) to the estimate of trend. We described and illustrated trend adjustment in Section 2.4.2, and the basic trend adjustment model introduced there was

Yr=f3o+f3tt+c:, t=l,2, ... ,T

Fazer no Minitab

82 REGRESSION ANALYSIS AND FORECASTING

TABLE 3.3 Minitab Regression Output for the Patient Satisfaction Data in Table 3.2 Regression Analysis: Satisfaction Versus Age, Severity

The regression equation is

Satisfaction= 143 - 1.03 Age - 0.556 Severity

Predictor Coef SE Coef T ^p

Constant 143.472 5.955 24.09 0.000

Age -1.0311 0.1156 -8.92 0.000

Severity -0.5560 0.1314 -4.23 0.000

s 7.11767 R-Sq 89.7% R-Sq(adj) 88.7%

Analysis of Variance

Source DF ss MS F ^p

Regression 2 9663.7 4831.8 95.38 0.000 Residual Error 22 1114. 5 50.7

Total 24 10778.2

Source DF Seq SS

Age 1 8756.7

Severity 1 907.0

Obs Age Satisfaction Fit SE Fit Residual St Resid

1 55.0 68.00 58.96 1. 51 9.04 1. 30

2 46.0 77.00 82.70 2.99 -5.70 -0.88

3 30.0 96.00 86.96 2.80 9.04 1. 38

4 35.0 80.00 80.70 2.45 -0.70 -0.10

5 59.0 43.00 50.39 1. 96 -7.39 -1.08

6 61.0 44.00 47.22 2.13 -3.22 -0.47

7 74.0 26.00 31.03 2.89 -5.03 -0.77

8 38.0 88.00 80.94 1. 92 7.06 1. 03

9 27.0 75.00 92.28 2.90 -17.28 -2.66R

10 51.0 57.00 63.09 1. 52 -6.09 -0.88

11 53.0 56.00 67.70 1. 86 -11.70 -1.70

12 41.0 88.00 84.52 2.28 3.48 0.52

13 37.0 88.00 88.09 2.26 -0.09 -0.01

14 24.0 102.00 99.82 2.99 2.18 0.34

15 42.0 88.00 83.49 2.28 4.51 0.67

16 50.0 70.00 65.23 1. 46 4.77 0.68

17 58.0 52.00 49.75 2.21 2.25 0.33

18 60.0 43.00 42.13 3.21 0.87 0.14

Modelo Reduzido

LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 83 TABLE3.3 Minitab Regression Output for the Patient Satisfaction Data in Table 3.2 (Continued)

19 62.0 46.00 45.07 2.30 0.93 0.14

20 68.0 56.00 52.23 3.04 3.77 0.59

21 70.0 59.00 48.50 2.98 10.50 1. 62

22 79.0 26.00 25.32 3.24 0.68 0.11

23 63.0 52.00 61.28 3.28 -9.28 -1.47

24 39.0 83.00 79.91 1. 85 3.09 0.45

25 49.0 75.00 70.71 1. 58 4.29 0.62

R denotes an observation with a large standardized residual.

The least squares normal equations for this model are

A A T(T+l) ~

Tf3o

+

fJ1

2 ⁼LYr

t=l

A T(T + 1) A T(T + 1)(2T + 1) ~

f3o

+

fJI 6

=

L ty,

1=1

Because there are only two parameters, it is easy to solve the normal equations directly, resulting in the least squares estimators

A 2(2T+l) T 6 T

f3o =

^T(T- ^I)

LYr-

^{T(T -1)}

LlYr

1=1 1=1

A 12 T 6 T

fJI

=

^T(T2- ^I)

~tyl-

^T(T- ^I)

~y~

Minitab computes these parameter estimates in its trend adjustment procedure, which we illustrated in Example 2.6. The least squares estimates obtained from this trend adjustment model depend on the point in time at which they were computed, that is, T. Sometimes it may be convenient to keep track of the period of computation and denote the estimates as functions of time, say,

/3

0(T) and /31 (T). The model can be used to predict the next observation by predicting the point on the trend line in period T + 1, which is ~0^(T)⁺^~1^{(T)(T +}^1),and adding to the trend a forecast of the next residual, say,

e

7+₁(I). If the residuals are structureless and have average value zero, the forecast of the next residual would be zero. Then the forecast of the next observation would be

YT+I(T)

=

~o(T)

+

~1^(T)(T

+

Ver similaridade com Least Square para Cross-Sectional Data

84 REGRESSION ANALYSIS AND FORECASTING

When a new observation becomes available, the parameter estimates ~0⁽^T)^and^~^{1 (}^T) could be updated to reflect the new information. This could be done by solving the normal equations again. In some situations it is possible to devise simple updating equations so that new estimates ~0⁽^T

+

I) and ~ 1 ( T

+

I) can be computed directly from the previous ones ~0^{( T) and~}¹^(T)without having to directly solve the normal

equations. We will show how to do this later. •

No documento Introduction to Time Series Analysis and Forecasting (páginas 88-97)