MATHEMATICAL PRELIMINARIES
4.4 LEAST-SQUARES ESTIMATION .1 Linear least-squares method
The least-squares method can be regarded as a deterministic approach to the es-timation problem where no assumptions on the probability distributions, etc., are necessary. However, statistical arguments can be used to justify the least-squares method, and they give further insight into its properties. Least-squares estimation is discussed in numerous books, in a more thorough fashion from estimation point-of-view, for example, in [407, 299].
In the basic linear least-squares method, the
T
-dimensional data vectorsx
T are assumed to obey the following model:x
T= H
+ v
T (4.35)Here is again the
m
-dimensional parameter vector, andv
T is aT
-vector whose components are the unknown measurement errorsv ( j ) ; j = 1 ;::: ;T
. TheT
m
observation matrix
H
is assumed to be completely known. Furthermore, the number of measurements is assumed to be at least as large as the number of unknown parameters, so thatT
m
. In addition, the matrixH
has the maximum rankm
.First, it can be noted that if
m = T
, we can setv
T =0
, and get a unique solution=
H
?1x
T. If there were more unknown parameters than measurements (m > T
),infinitely many solutions would exist for Eqs. (4.35) satisfying the condition
v
=0
. However, if the measurements are noisy or contain errors, it is generally highly desirable to have much more measurements than there are parameters to be estimated, in order to obtain more reliable estimates. So, in the following we shall concentrate on the caseT > m
.When
T > m
, equation (4.35) has no solution for whichv
T =0
. Because the measurement errorsv
T are unknown, the best that we can then do is to choose an estimator^
that minimizes in some sense the effect of the errors. For mathematical convenience, a natural choice is to consider the least-squares criterionELS
= 12
kv
T k2= 12( x
T ?H
)
T( x
T ?H
)
(4.36)LEAST-SQUARES ESTIMATION 87
Note that this differs from the error criteria in Section 4.2 in that no expectation is involved and the criterionELStries to minimize the measurement errors
v
, and notdirectly the estimation error?
^
.Minimization of the criterion (4.36) with respect to the unknown parameters leads to so-called normal equations [407, 320, 299]
( H
TH )
^
LS= H
Tx
T (4.37)for determining the least-squares estimate
^
LSof. It is often most convenient to solve^
LSfrom these linear equations. However, because we assumed that the matrixH
has full rank, we can explicitly solve the normal equations, getting^
LS
= ( H
TH )
?1H
Tx
T= H
+x
T (4.38) whereH
+=( H
TH )
?1H
Tis the pseudoinverse ofH
(assuming thatH
has maximal rankm
and more rows than columns:T > m
) [169, 320, 299].The least-squares estimator can be analyzed statistically by assuming that the measurement errors have zero mean: Ef
v
Tg=0
. It is easy to see that the least-squares estimator is unbiased: Ef^
LS j g= . Furthermore, if the covariance matrix of the measurement errorsC
v = Efv
Tv
TTgis known, one can compute the covariance matrix (4.19) of the estimation error. These simple analyses are left as an exercise to the reader.Example 4.5 The least-squares method is commonly applied in various branches of science to linear curve fitting. The general setting here is as follows. We try to fit to the measurements the linear model
y ( t ) =
Xmi=1
a
ii( t ) + v ( t )
(4.39)Here
i( t )
,i = 1 ; 2 ;::: ;m
, arem
basis functions that can be generally nonlinear functions of the argumentt
— it suffices that the model (4.39) be linear with respect to the unknown parametersa
i. Assume now that there are available measurementsy ( t
1) ;y ( t
2) ;::: ;y ( t
T)
at argument valuest
1;t
2;::: ;t
T, respectively. The linear model (4.39) can be easily written in the vector form (4.35), where now the parameter vector is given by= [ a
1;a
2;::: ;a
m]
T (4.40)and the data vector by
x
T= [ y ( t
1) ;y ( t
2) ;::: ;y ( t
T)]
T (4.41)Similarly, the vector
v
T =[ v ( t
1) ;v ( t
2) ;::: ;v ( t
T)]
T contains the error termsv ( t
i)
.The observation matrix becomes
H =
2
6
6
6
4
1( t
1)
2( t
1)
m( t
1)
1( t
2)
2( t
2)
m( t
2)
... ... . .. ...
1( t
T)
2( t
T)
m( t
T)
3
7
7
7
5
(4.42)
Inserting the numerical values into (4.41) and (4.42) one can now determine
H
andx
T, and then compute the least-squares estimates^ a
i;LS of the parametersa
iof the curve from the normal equations (4.37) or directly from (4.38).The basis functions
i( t )
are often chosen so that they satisfy the orthonormality conditionsT
X
i=1
j( t
i)
k( t
i) =
1 ; j = k
0 ; j
6= k
(4.43)Now
H
TH
=I
, since Eq. (4.43) represents this condition for the elements( j;k )
ofthe matrix
H
TH
. This implies that the normal equations (4.37) reduce to the simple form^
LS=H
Tx
T. Writing out this equation for each component of^
LSprovides for the least-squares estimate of the parametera
i^ a
i;LS=
XTj=1
i( t
j) y ( t
j) ; i = 1 ;::: ;m
(4.44)Note that the linear data model (4.35) employed in the least-squares method re-sembles closely the noisy linear ICA model
x
=As + n
to be discussed in Chapter 15.Clearly, the observation matrix
H
in (4.35) corresponds to the mixing matrixA
, theparameter vectorto the source vector
s
, and the error vectorv
to the noise vectorn
in the noisy ICA model. These model structures are thus quite similar, but the assumptions made on the models are clearly different. In the least-squares model the observation matrixH
is assumed to be completely known, while in the ICA model the mixing matrixA
is unknown. This lack of knowledge is compensated in ICA by assuming that the components of the source vectors
are statistically independent, while in the least-squares model (4.35) no assumptions are needed on the parameter vector. Even though the models look the same, the different assumptions lead to quite different methods for estimating the desired quantities.The basic least-squares method is simple and widely used. Its success in practice depends largely on how well the physical situation can be described using the linear model (4.35). If the model (4.35) is accurate for the data and the elements of the observation matrix
H
are known from the problem setting, good estimation results can be expected.4.4.2 Nonlinear and generalized least-squares estimators *
Generalized least-squares The least-squares problem can be generalized by adding a symmetric and positive definite weighting matrix
W
to the criterion (4.36).The weighted criterion becomes [407, 299]
EWLS
= ( x
T ?H
)
TW ( x
T ?H
)
(4.45)It turns out that a natural, optimal choice for the weighting matrix
W
is the inverse of the covariance matrix of the measurement errors (noise)W
=C
?1v . This is becauseLEAST-SQUARES ESTIMATION 89
for this choice the resulting generalized least-squares estimator
^
WLS
= ( H
TC
?1vH )
?1H
TC
?1vx
T (4.46) also minimizes the mean-square estimation errorEMSE = Efk?^
k2jg[407, 299]. Here it is assumed that the estimator^
is linear and unbiased. The estimator (4.46) is often referred to as the best linear unbiased estimator (BLUE) or Gauss-Markov estimator.Note that (4.46) reduces to the standard least-squares solution (4.38) if
C
v=2I
.This happens, for example, when the measurement errors
v ( j )
have zero mean and are mutually independent and identically distributed with a common variance2. Thechoice
C
v=2
I
also applies if we have no prior knowledge of the covariance matrixC
vof the measurement errors. In these instances, the best linear unbiased estimator (BLUE) minimizing the mean-square error coincides with the standard least-squares estimator. This connection provides a strong statistical argument supporting the use of the least-squares method, because the mean-square error criterion directly measures the estimation error?^
.Nonlinear least-squares The linear data model (4.35) employed in the linear least-squares methods is not adequate for describing the dependence between the parametersand the measurements
x
T in many instances. It is therefore natural to consider the following more general nonlinear data modelx
T= f (
) + v
T (4.47)Here
f
is a vector-valued nonlinear and continuously differentiable function of the parameter vector. Each componentf
i(
)
off (
)
is assumed to be a known scalar function of the components of.Similarly to previously, the nonlinear least-squares criterionENLS is defined as the squared sum of the measurement (or modeling) errors k
v
T k2 = Pj[ v ( j )]
2.From the model (4.47), we get
ENLS
= [ x
T ?f (
)]
T[ x
T ?f (
)]
(4.48)The nonlinear least-squares estimator
^
NLS is the value ofthat minimizesENLS. The nonlinear least-squares problem is thus nothing but a nonlinear optimization problem where the goal is to find the minimum of the functionENLS. Such problems cannot usually be solved analytically, but one must resort to iterative numerical methods for finding the minimum. One can use any suitable nonlinear optimization method for finding the estimate^
NLS. These optimization procedures are discussed briefly in Chapter 3 and more thoroughly in the books referred to there.The basic linear least-squares method can be extended in several other directions.
It generalizes easily to the case where the measurements (made, for example, at different time instants) are vector-valued. Furthermore, the parameters can be time-varying, and the least-squares estimator can be computed adaptively (recursively).
See, for example, the books [407, 299] for more information.