• Nenhum resultado encontrado

LEAST-SQUARES ESTIMATION .1 Linear least-squares method

No documento Independent Component Analysis (páginas 108-112)

MATHEMATICAL PRELIMINARIES

4.4 LEAST-SQUARES ESTIMATION .1 Linear least-squares method

The least-squares method can be regarded as a deterministic approach to the es-timation problem where no assumptions on the probability distributions, etc., are necessary. However, statistical arguments can be used to justify the least-squares method, and they give further insight into its properties. Least-squares estimation is discussed in numerous books, in a more thorough fashion from estimation point-of-view, for example, in [407, 299].

In the basic linear least-squares method, the

T

-dimensional data vectors

x

T are assumed to obey the following model:

x

T

= H

+ v

T (4.35)

Here is again the

m

-dimensional parameter vector, and

v

T is a

T

-vector whose components are the unknown measurement errors

v ( j ) ; j = 1 ;::: ;T

. The

T

m

observation matrix

H

is assumed to be completely known. Furthermore, the number of measurements is assumed to be at least as large as the number of unknown parameters, so that

T

m

. In addition, the matrix

H

has the maximum rank

m

.

First, it can be noted that if

m = T

, we can set

v

T =

0

, and get a unique solution

=

H

?1

x

T. If there were more unknown parameters than measurements (

m > T

),

infinitely many solutions would exist for Eqs. (4.35) satisfying the condition

v

=

0

. However, if the measurements are noisy or contain errors, it is generally highly desirable to have much more measurements than there are parameters to be estimated, in order to obtain more reliable estimates. So, in the following we shall concentrate on the case

T > m

.

When

T > m

, equation (4.35) has no solution for which

v

T =

0

. Because the measurement errors

v

T are unknown, the best that we can then do is to choose an estimator

^

that minimizes in some sense the effect of the errors. For mathematical convenience, a natural choice is to consider the least-squares criterion

ELS

= 12

k

v

T k2

= 12( x

T ?

H

)

T

( x

T ?

H

)

(4.36)

LEAST-SQUARES ESTIMATION 87

Note that this differs from the error criteria in Section 4.2 in that no expectation is involved and the criterionELStries to minimize the measurement errors

v

, and not

directly the estimation error?

^

.

Minimization of the criterion (4.36) with respect to the unknown parameters leads to so-called normal equations [407, 320, 299]

( H

T

H )

^

LS

= H

T

x

T (4.37)

for determining the least-squares estimate

^

LSof. It is often most convenient to solve

^

LSfrom these linear equations. However, because we assumed that the matrix

H

has full rank, we can explicitly solve the normal equations, getting

^

LS

= ( H

T

H )

?1

H

T

x

T

= H

+

x

T (4.38) where

H

+=

( H

T

H )

?1

H

Tis the pseudoinverse of

H

(assuming that

H

has maximal rank

m

and more rows than columns:

T > m

) [169, 320, 299].

The least-squares estimator can be analyzed statistically by assuming that the measurement errors have zero mean: Ef

v

Tg=

0

. It is easy to see that the least-squares estimator is unbiased: Ef

^

LS j g= . Furthermore, if the covariance matrix of the measurement errors

C

v = Ef

v

T

v

TTgis known, one can compute the covariance matrix (4.19) of the estimation error. These simple analyses are left as an exercise to the reader.

Example 4.5 The least-squares method is commonly applied in various branches of science to linear curve fitting. The general setting here is as follows. We try to fit to the measurements the linear model

y ( t ) =

Xm

i=1

a

i

i

( t ) + v ( t )

(4.39)

Here

i

( t )

,

i = 1 ; 2 ;::: ;m

, are

m

basis functions that can be generally nonlinear functions of the argument

t

— it suffices that the model (4.39) be linear with respect to the unknown parameters

a

i. Assume now that there are available measurements

y ( t

1

) ;y ( t

2

) ;::: ;y ( t

T

)

at argument values

t

1

;t

2

;::: ;t

T, respectively. The linear model (4.39) can be easily written in the vector form (4.35), where now the parameter vector is given by

= [ a

1

;a

2

;::: ;a

m

]

T (4.40)

and the data vector by

x

T

= [ y ( t

1

) ;y ( t

2

) ;::: ;y ( t

T

)]

T (4.41)

Similarly, the vector

v

T =

[ v ( t

1

) ;v ( t

2

) ;::: ;v ( t

T

)]

T contains the error terms

v ( t

i

)

.

The observation matrix becomes

H =

2

6

6

6

4

1

( t

1

)

2

( t

1

)

m

( t

1

)

1

( t

2

)

2

( t

2

)

m

( t

2

)

... ... . .. ...

1

( t

T

)

2

( t

T

)

m

( t

T

)

3

7

7

7

5

(4.42)

Inserting the numerical values into (4.41) and (4.42) one can now determine

H

and

x

T, and then compute the least-squares estimates

^ a

i;LS of the parameters

a

iof the curve from the normal equations (4.37) or directly from (4.38).

The basis functions

i

( t )

are often chosen so that they satisfy the orthonormality conditions

T

X

i=1

j

( t

i

)

k

( t

i

) =

1 ; j = k

0 ; j

6

= k

(4.43)

Now

H

T

H

=

I

, since Eq. (4.43) represents this condition for the elements

( j;k )

of

the matrix

H

T

H

. This implies that the normal equations (4.37) reduce to the simple form

^

LS=

H

T

x

T. Writing out this equation for each component of

^

LSprovides for the least-squares estimate of the parameter

a

i

^ a

i;LS

=

XT

j=1

i

( t

j

) y ( t

j

) ; i = 1 ;::: ;m

(4.44)

Note that the linear data model (4.35) employed in the least-squares method re-sembles closely the noisy linear ICA model

x

=

As + n

to be discussed in Chapter 15.

Clearly, the observation matrix

H

in (4.35) corresponds to the mixing matrix

A

, the

parameter vectorto the source vector

s

, and the error vector

v

to the noise vector

n

in the noisy ICA model. These model structures are thus quite similar, but the assumptions made on the models are clearly different. In the least-squares model the observation matrix

H

is assumed to be completely known, while in the ICA model the mixing matrix

A

is unknown. This lack of knowledge is compensated in ICA by assuming that the components of the source vector

s

are statistically independent, while in the least-squares model (4.35) no assumptions are needed on the parameter vector. Even though the models look the same, the different assumptions lead to quite different methods for estimating the desired quantities.

The basic least-squares method is simple and widely used. Its success in practice depends largely on how well the physical situation can be described using the linear model (4.35). If the model (4.35) is accurate for the data and the elements of the observation matrix

H

are known from the problem setting, good estimation results can be expected.

4.4.2 Nonlinear and generalized least-squares estimators *

Generalized least-squares The least-squares problem can be generalized by adding a symmetric and positive definite weighting matrix

W

to the criterion (4.36).

The weighted criterion becomes [407, 299]

EWLS

= ( x

T ?

H

)

T

W ( x

T ?

H

)

(4.45)

It turns out that a natural, optimal choice for the weighting matrix

W

is the inverse of the covariance matrix of the measurement errors (noise)

W

=

C

?1v . This is because

LEAST-SQUARES ESTIMATION 89

for this choice the resulting generalized least-squares estimator

^

WLS

= ( H

T

C

?1v

H )

?1

H

T

C

?1v

x

T (4.46) also minimizes the mean-square estimation errorEMSE = Efk?

^

k2jg[407, 299]. Here it is assumed that the estimator

^

is linear and unbiased. The estimator (4.46) is often referred to as the best linear unbiased estimator (BLUE) or Gauss-Markov estimator.

Note that (4.46) reduces to the standard least-squares solution (4.38) if

C

v=

2

I

.

This happens, for example, when the measurement errors

v ( j )

have zero mean and are mutually independent and identically distributed with a common variance

2. The

choice

C

v=

2

I

also applies if we have no prior knowledge of the covariance matrix

C

vof the measurement errors. In these instances, the best linear unbiased estimator (BLUE) minimizing the mean-square error coincides with the standard least-squares estimator. This connection provides a strong statistical argument supporting the use of the least-squares method, because the mean-square error criterion directly measures the estimation error?

^

.

Nonlinear least-squares The linear data model (4.35) employed in the linear least-squares methods is not adequate for describing the dependence between the parametersand the measurements

x

T in many instances. It is therefore natural to consider the following more general nonlinear data model

x

T

= f (

) + v

T (4.47)

Here

f

is a vector-valued nonlinear and continuously differentiable function of the parameter vector. Each component

f

i

(

)

of

f (

)

is assumed to be a known scalar function of the components of.

Similarly to previously, the nonlinear least-squares criterionENLS is defined as the squared sum of the measurement (or modeling) errors k

v

T k2 = Pj

[ v ( j )]

2.

From the model (4.47), we get

ENLS

= [ x

T ?

f (

)]

T

[ x

T ?

f (

)]

(4.48)

The nonlinear least-squares estimator

^

NLS is the value ofthat minimizesENLS. The nonlinear least-squares problem is thus nothing but a nonlinear optimization problem where the goal is to find the minimum of the functionENLS. Such problems cannot usually be solved analytically, but one must resort to iterative numerical methods for finding the minimum. One can use any suitable nonlinear optimization method for finding the estimate

^

NLS. These optimization procedures are discussed briefly in Chapter 3 and more thoroughly in the books referred to there.

The basic linear least-squares method can be extended in several other directions.

It generalizes easily to the case where the measurements (made, for example, at different time instants) are vector-valued. Furthermore, the parameters can be time-varying, and the least-squares estimator can be computed adaptively (recursively).

See, for example, the books [407, 299] for more information.

No documento Independent Component Analysis (páginas 108-112)