Regression Analysis and Forecasting
3.2 LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS
We begin with the situation where the regression model is used with cross-section data. The model is given in Eq. (3.4). There are n > k observations on the response variable available, say, y 1, y2, ... , }'11 • Along with each observed response y;, we will have an observation on each regressor or predictor variable and x;i denotes the ith observation or level of variable xi. The data will appear as in Table 3.1. We assume that the error term E in the model has expected value E (E) = 0 and variance Var (E) = a 2, and that the errors E;, i = I, 2, ... , n are uncorrelated random variables.
TABLE3.1 Cross-Section Data for Multiple Linear Regression
Observation Response, y x, X::
.r,
Y1
x,,
.:rl~ Xik2 Y2 X21 X2:.: _:r2~
ll )'n Xni Xn2 XnJ...
76 REGRESSION ANALYSI~ AND FORECASTING
The method ofleast squares chooses the model parameters (the f3 's) in Eq. (3.4) so that the sum of the squares of the errors, e;, is minimized. The least squares function is
n n
L
=
L c:J=
L (y; - f3o- fJ,x;, - fJ2x;2- · · ·- f3kx'k )2i=l i=l
(3.6)
This function is to be minimized with respect to f3o. {31, .••• f3k. Therefore the least squares estimators, say, fJo. fJ 1 , ... , fJ b must satisfy
=0 (3.7)
and
(3.8) Simplifying Eqs. (3.7) and (3.8) we obtain
II II fl fl
nfJo
+ fJ,
Lx;,
+fJ2 L x;2+ · · ·+
fJk L X;k = LY; (3.9)i=l i=l i=l 1=1
n n n n n
fJo
I:xil+fJ, l:::x?,+fJ2
LX;2x;,+ · · ·+
fJk LX;kXil = LY;X;Ii=l i=l i=l i=l 1=1
(3.10)
n n n 11 n
fJo LX;k+fJ, Lx;,x;k
+
fJ2 Lx;2x;k+ · · ·+
fJk l:::x;", = LY;XIki=l i=l i=l 1=1 i=l
These equations are called the least squares normal equations. Note that there are p = k
+
I normal equations, one for each of the unknown regression coefficients.The solutions to the normal equations will be the least squares estimators ofthe model regression coefficients.
It is simpler to solve the normal equations if they are expressed in matrix notation.
We now give a matrix development of the normal equations that parallels the devel- opment of Eq. (3.1 0). The multiple linear regression model may be written in matrix notation as
y = Xj3
+
£ (3.11)Ver exemplo para Regressão Linear Simples
x1 tem duas eqs.
x1 e x2 tem tres eqs
LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 77 where
[y, l h[
XJI X12 · · · XJkY2 X21 X22 · · • Xzk
y= . '
Yn Xnl Xn2 · · · Xnk
In general, y is an (11 x I) vector of the observations, X is an (11 x p) matrix of the levels of the regressor variables,
13
is a ( p x I) vector of the regression coefficients, and £ is an (11 x I) vector of random errors. X is usually called the model matrix, because it is the original data table for the problem expanded to the form of the regression model that you desire to fit.The vector of least squares estimators minimizes
n
L =
z::>;
= t:' t: = (y- X(3)'(y- X(3)i=l
We can expand the right-hand side of Land obtain
L
=
y'y- (3'X'y- y'X(3+
(3'X'X(3=
y'y- 2(3'X'y+
(3'X'X(3because (3'X'y is a (I xI) matrix, or a scalar, and its transpose ((3'X'y)'
=
y'X(3 is the same scalar. The least squares estimators must satisfy- aLI =
-2X'y+
2(X'X)(3 ~=
0 a(3 ~which simplifies to
(X'X)~
=
X'y (3.12)In Eq. (3.12) X' X is a (p x p) symmetric matrix and X'y is a (p x I) column vector.
Equation (3.12) is just the matrix form of the least squares normal equations. It is identical to Eq. (3.10). To solve the normal equations, multiply both sides of Eq.
(3.12) by the inverse of X'X (we assume that this inverse exists). Thus the least squares estimator of ~ is
(3.13)
78 REGRESSION ANALYSIS AND FORECASTING
The fitted values of the response variable from the regression model are computed from
(3.14)
or in scalar notation,
(3.15)
The difference between the actual observation y; and the corre~ponding fitted value is the residual e; = y; - _\·;, i = I. 2 ... 11. The 11 residuals can be written as an
(11 x 1) vector denoted by
e
=
y-y =
y -X~ (3.16)In addition to estimating the regression coefficients {30 • {31 •.•.• f3J.., it is also neces- sary to estimate the variance of the model errors. a". The estimator of this parameter involves the sum of squares of the residuals
We can show that £(SSE) = (11 - p)a2, so the estimator of a" is the residual or mean square error
' 1 SSE
a - = - - (3.17)
11-p
The method of least squares is not the only way to estimate the parameters in a linear regression model, but it is widely used, and it results in estimates of the model parameters that have nice properties. If the model is correct (it has the right form and includes all of the relevant predictors), the least squares estimator ~ is an unbiased estimator of the model parameters
f3;
that is.£(~) =
13
The variances and covariances of the estimators ~ are contained in a ( p x p) covari- ance matrix
(3.18)
The variances of the regression coefficients are on the main diagonal of this matrix and the covariances are on the off-diagonals.
BLUE - Best Linear Unbiased Estimator
LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 79 Example 3.1
A hospital is implementing a program to improve quality and productivity. As part of this program, the hospital is attempting to measure and evaluate patient satisfaction.
Table 3.2 contains some of the data that has been collected for a random sample of 25 recently discharged patients. The "severity" variable is an index that measures the severity of the patient's illness, measured on an increasing scale (i.e., more severe ill- nesses have higher values of the index), and the response satisfaction is also measured on an increasing scale, with larger values indicating greater satisfaction.
We will fit a multiple linear regression model to the patient satisfaction data. The model is
where y
=
patient satisfaction, x1=
patient age, and x2=
illness severity. To solve the least squares normal equations, we will need to set up the X'X matrix and the X'yTABLE3.2 Patient Satisfaction Survey Data
Observation Age (xJ) Severity (x2 ) Satisfaction (y)
55 50 68
2 46 24 77
3 30 46 96
4 35 48 80
5 59 58 43
6 61 60 44
7 74 65 26
8 38 42 88
9 27 42 75
10 51 50 57
II 53 38 56
12 41 30 88
13 37 31 88
14 24 34 102
15 42 30 88
16 50 48 70
17 58 61 52
18 60 71 43
19 62 62 46
20 68 38 56
21 70 41 59
22 79 66 26
23 63 31 52
24 39 42 83
25 49 40 75
Exercício: Desenvolver pelo Matlab
80 REGRESSION ANALYSIS AND FORECASTING
vector. The model matrix X and observation vector y are
55 50 68
46 24 77
30 46 96
35 48 80
59 58 43
61 60 44
74 65 26
38 42 88
27 42 75
51 50 57
53 38 56
41 30 88
X= 37 31 Y= 88
24 34 102
42 30 88
50 48 70
58 61 52
60 71 43
62 62 46
68 38 56
70 41 59
79 66 26
63 31 52
39 42 83
49 40 75
The X'X matrix and the X'y vector are
x'x~ Us
50 46 24 1and
. . . 1 l
['
... :;; ] :
1 46 24
55
50]
46 24 25 1271
'
~
[ 1271 69881. 1148 60814 49 40
l ]
l ~~] [
1638 ]49 . = 76487
40 : 70426
75
1148]
60814 56790
LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 81 Using Eq. (3.13), we can find the least squares estimates of the parameters in the regression model as
~ = (X'X)-1X'y
[ ~;71 !~~~I !b:~ 4]
- I [~:!:7]
1148 60814 56790 70426
[
0.699946097 -0.006128086 -0.007586982] [ 1638 ] -0.006128086 0.00026383 -0.000158646 76487 -0.007586982 -0.000158646 0.000340866 70426
[
143.4720118]
-1.031053414 -0.55603781 Therefore the regression model is
y
= 143.472- l.03lxl - 0.556x2where x 1 =patient age and x2 = severity of illness, and we have reported the regres-
sion coefficients to three decimal places. •
Table 3.3 shows the output from the Minitab regression routine for the patient satisfaction data. Note that, in addition to the fitted regression model, Mini tab provides a list of the residuals computed from Eq. (3.16) along with other output that will provide information about the quality of the regression model. This output will be explained in subsequent sections, and we will frequently refer back to Table 3.3.
Example 3.2 Trend Adjustment
One way to forecast time series data that contains a linear trend is with a trend adjustment procedure. This involves fitting a model with a linear trend term in time, subtracting the fitted values from the original observations to obtain a set of residuals that are trend-free, then forecast the residuals, and compute the forecast by adding the forecast of the residual value(s) to the estimate of trend. We described and illustrated trend adjustment in Section 2.4.2, and the basic trend adjustment model introduced there was
Yr=f3o+f3tt+c:, t=l,2, ... ,T
Fazer no Minitab
82 REGRESSION ANALYSIS AND FORECASTING
TABLE 3.3 Minitab Regression Output for the Patient Satisfaction Data in Table 3.2 Regression Analysis: Satisfaction Versus Age, Severity
The regression equation is
Satisfaction= 143 - 1.03 Age - 0.556 Severity
Predictor Coef SE Coef T p
Constant 143.472 5.955 24.09 0.000
Age -1.0311 0.1156 -8.92 0.000
Severity -0.5560 0.1314 -4.23 0.000
s 7.11767 R-Sq 89.7% R-Sq(adj) 88.7%
Analysis of Variance
Source DF ss MS F p
Regression 2 9663.7 4831.8 95.38 0.000 Residual Error 22 1114. 5 50.7
Total 24 10778.2
Source DF Seq SS
Age 1 8756.7
Severity 1 907.0
Obs Age Satisfaction Fit SE Fit Residual St Resid
1 55.0 68.00 58.96 1. 51 9.04 1. 30
2 46.0 77.00 82.70 2.99 -5.70 -0.88
3 30.0 96.00 86.96 2.80 9.04 1. 38
4 35.0 80.00 80.70 2.45 -0.70 -0.10
5 59.0 43.00 50.39 1. 96 -7.39 -1.08
6 61.0 44.00 47.22 2.13 -3.22 -0.47
7 74.0 26.00 31.03 2.89 -5.03 -0.77
8 38.0 88.00 80.94 1. 92 7.06 1. 03
9 27.0 75.00 92.28 2.90 -17.28 -2.66R
10 51.0 57.00 63.09 1. 52 -6.09 -0.88
11 53.0 56.00 67.70 1. 86 -11.70 -1.70
12 41.0 88.00 84.52 2.28 3.48 0.52
13 37.0 88.00 88.09 2.26 -0.09 -0.01
14 24.0 102.00 99.82 2.99 2.18 0.34
15 42.0 88.00 83.49 2.28 4.51 0.67
16 50.0 70.00 65.23 1. 46 4.77 0.68
17 58.0 52.00 49.75 2.21 2.25 0.33
18 60.0 43.00 42.13 3.21 0.87 0.14
Modelo Reduzido
LEAST SQUARES ESTIMATION IN LINEAR REGRESSION MODELS 83 TABLE3.3 Minitab Regression Output for the Patient Satisfaction Data in Table 3.2 (Continued)
19 62.0 46.00 45.07 2.30 0.93 0.14
20 68.0 56.00 52.23 3.04 3.77 0.59
21 70.0 59.00 48.50 2.98 10.50 1. 62
22 79.0 26.00 25.32 3.24 0.68 0.11
23 63.0 52.00 61.28 3.28 -9.28 -1.47
24 39.0 83.00 79.91 1. 85 3.09 0.45
25 49.0 75.00 70.71 1. 58 4.29 0.62
R denotes an observation with a large standardized residual.
The least squares normal equations for this model are
A A T(T+l) ~
Tf3o
+
fJ12 = LYr
t=l
A T(T + 1) A T(T + 1)(2T + 1) ~
f3o
2+
fJI 6=
L ty,1=1
Because there are only two parameters, it is easy to solve the normal equations directly, resulting in the least squares estimators
A 2(2T+l) T 6 T
f3o =
T(T- I)LYr-
T(T -1)LlYr
1=1 1=1
A 12 T 6 T
fJI
=
T(T2- I)~tyl-
T(T- I)~y~
Minitab computes these parameter estimates in its trend adjustment procedure, which we illustrated in Example 2.6. The least squares estimates obtained from this trend adjustment model depend on the point in time at which they were computed, that is, T. Sometimes it may be convenient to keep track of the period of computation and denote the estimates as functions of time, say,
/3
0(T) and /31 (T). The model can be used to predict the next observation by predicting the point on the trend line in period T + 1, which is ~0(T) + ~1(T)(T + 1), and adding to the trend a forecast of the next residual, say,e
7+1 (I). If the residuals are structureless and have average value zero, the forecast of the next residual would be zero. Then the forecast of the next observation would beYT+I(T)
=
~o(T)+
~1(T)(T+
1)Ver similaridade com Least Square para Cross-Sectional Data
84 REGRESSION ANALYSIS AND FORECASTING
When a new observation becomes available, the parameter estimates ~0( T) and ~ 1 ( T) could be updated to reflect the new information. This could be done by solving the normal equations again. In some situations it is possible to devise simple updating equations so that new estimates ~0( T
+
I) and ~ 1 ( T+
I) can be computed directly from the previous ones ~0( T) and~ 1 (T) without having to directly solve the normalequations. We will show how to do this later. •