• Nenhum resultado encontrado

CONCLUDING REMARKS AND REFERENCES

No documento Independent Component Analysis (páginas 121-127)

MATHEMATICAL PRELIMINARIES

4.7 CONCLUDING REMARKS AND REFERENCES

CONCLUDING REMARKS AND REFERENCES 99

difficult integrations needed in computing the minimum mean-square estimator. If the posterior distribution

p (

j

x

T

)

is symmetric around its peak value, the MAP estimator and MSE estimator coincide.

There is no guarantee that the MAP estimator is unbiased. It is also generally difficult to compute the covariance matrix of the estimation error for the MAP and ML estimators. However, the MAP estimator is intuitively sensible, yields in most cases good results in practice, and it has good asymptotic properties under appropriate conditions. These desirable characteristics justify its use.

^

MAP

^

MMSE

^

ABS

^

p(

^

jx)

Fig. 4.2 A posterior density

p

(

jx), and the respective MAP estimate

^MAP, minimum

MSE estimate

^MSE, and the minimum absolute error estimate

^ABS.

Regrettably, it is generally difficult to determine the posterior distribution in a form that allows for convenient mathematical analysis [407]. However, various advanced and approximative techniques have been developed to facilitate Bayesian estimation;

see [142]. When the number of measurements increases, the importance of prior information gradually decreases, and the maximum likelihood estimator becomes asymptotically optimal.

Finally, we point out that neural networks provide in many instances a useful practical tool for nonlinear estimation, even though they lie outside the range of classic estimation theory. For example, the well-known back-propagation algorithm [48, 172, 376] is in fact a stochastic gradient algorithm for minimizing the mean-square error criterion

EMSE

=

Efk

d

?

f (

; z )

k2g (4.86)

Here

d

is the desired response vector and

z

the (input) data vector. The parameters

consist of weights that are adjusted so that the mapping error (4.86) is minimized.

The nonlinear function

f (

; z )

has enough parameters and a flexible form, so that it can actually model with sufficient accuracy any regular nonlinear function. The back-propagation algorithm learns the parametersthat define the estimated input-output mapping

f (

; z )

. See [48, 172, 376] for details and applications.

PROBLEMS 101

Problems 4.1 Show that:

4.1.1. the maximum likelihood estimator of the variance (4.58) becomes unbiased if the estimated mean

^

MLis replaced in (4.58) by the true one

.

4.1.2. if the mean is estimated from the observations, one must use the formula (4.6) for getting an unbiased estimator.

4.2 Assume that

^

1 and

^

2 are unbiased estimators of the parameter

having

variances var

(^

1

) =

12, var

(^

2

) =

22.

4.2.1. Show that for any scalar

0

1

, the estimator

^

3=

^

1

+ (1

?

)^

2

is unbiased.

4.2.2. Determine the mean-square error of

^

3assuming that

^

1and

^

2are statis-tically independent.

4.2.3. Find the value of

that minimizes this mean-square error.

4.3 Let the scalar random variable

z

be uniformly distributed on the interval

[0 ; )

.

There exist

T

independent samples

z (1) ;::: ;z ( T )

from

z

. Using them, the estimate

^

= max

( z ( i ))

is constructed for the parameter

.

4.3.1. Compute the probability density function of

^

. (Hint: First construct the cumulative distribution function.)

4.3.2. Is

^

unbiased or asymptotically unbiased?

4.3.3. What is the mean-square error Ef

(^

?

)

2j

gof the estimate

^

?

4.4 Assume that you know

T

independent observations of a scalar quantity that is gaussian distributed with unknown mean

and variance

2. Estimate

and

2

using the method of moments.

4.5 Assume that

x (1) ;x (2) ;::: ;x ( K )

are independent gaussian random variables having all the mean

0

and variance

2x. Then the sum of their squares

y =

XK

j=1

[ x ( j )]

2

is

2-distributed with the mean

K

x2and variance

2 K

4x. Estimate the parameters

K

and

2xusing the method of moments, assuming that there exist

T

measurements

y (1) ;y (2) ;::: ;y ( T )

on the sum of squares

y

.

4.6 Derive the normal equations (4.37) for the least-squares criterion (4.36). Justify why these equations indeed provide the minimum of the criterion.

4.7 Assume that the measurement errors have zero mean: Ef

v

Tg=

0

, and that

the covariance matrix of the measurement errors is

C

v = Ef

v

T

v

TTg. Consider the properties of the least-squares estimator

^

LSin (4.38).

4.7.1. Show that the estimator

^

LSis unbiased.

4.7.2. Compute the error covariance matrix

C

~defined in (4.19).

4.7.3. Compute

C

~when

C

v=

2

I

.

4.8 Consider line fitting using the linear least-squares method. Assume that you know

T

measurements

x (1) ;x (2) ;::: ;x ( T )

on the scalar quantity

x

made, respec-tively, at times (or argument values)

t (1) ;t (2) ;::: ;t ( T )

. The task is to fit the line

x =

0

+

1

t

to these measurements.

4.8.1. Construct the normal equations for this problem using the standard linear least-squares method.

4.8.2. Assume that the sampling interval

t

is constant and has been scaled so that the measurement times are integers

1 ; 2 ;::: ;T

. Solve the normal equations in this important special case.

4.9 * Consider the equivalence of the generalized least-squares and linear unbiased minimum mean-square estimators. Show that

4.9.1. The optimal solution minimizing the generalized least-squares criterion (4.45) is

^

WLS

= ( H

T

WH )

?1

H

T

Wx

T

4.9.2. An unbiased linear mean-square estimator

^

MSE=

Lx

T satisfies the con-dition

LH = I

.

4.9.3. The mean-square error can be written in the form

EMSE

=

Efk?

^

k2jg

=

trace

( LC

v

L

T

)

4.9.4. Minimization of the preceding criterionEMSEunder the constraint

LH =

I

leads to the BLUE estimator (4.46).

4.10 For a fixed amount of gas, the following connection holds between the pressure

P

and the volume

V

:

PV

= c;

where

and

c

are constants. Assume that we know

T

pairs of measurements

( P

i

;V

i

)

.

We want to estimate the parameters

and

c

using the linear least-squares method.

Express the situation in the form of a matrix-vector model and explain how the estimates are computed (you need not compute the exact solution).

4.11 Let the probability density function of a scalar-valued random variable

z

be

p ( z

j

) =

2

ze

?z

; z

0 ; > 0

Determine the maximum likelihood estimate of the parameter

. There are available

T

independent measurements

z (1) ;::: ;z ( T )

on

z

.

4.12 In a signal processing application five sensors placed mutually according to a cross pattern yield, respectively, the measurements

x

0,

x

1,

x

2,

x

3, and

x

4, that can

be collected to the measurement vector

x

. The measurements are quantized with 7 bits accuracy so that their values are integers in the interval

0 ;::: ; 127

. The joint

PROBLEMS 103

density

p ( x

j

)

of the measurements is a multinomial density that depends on the unknown parameter

as follows:

p ( x

j

) = k ( x )(1 = 2)

x0

( = 4)

x1

(1 = 4

?

= 4)

x2

(1 = 4

?

= 4)

x3

( = 4)

x4

where the scaling term

k ( x ) = ( x

0

+ x

1

+ x

2

+ x

3

+ x

4

)!

x

0

! x

1

! x

2

! x

3

! x

4

!

Determine the maximum likelihood estimate of the parameter

in terms of the measurement vector

x

. (Here, you can here treat the individual measurements in a similar manner as mutually independent scalar measurements.)

4.13 Consider the sum

z

=

x

1

+ x

2

+ ::: + x

K, where the scalar random variables

x

i are statistically independent and gaussian, each having the same mean

0

and variance

x2.

4.13.1. Construct the maximum likelihood estimate for the number

K

of the

terms in the sum.

4.13.2. Is this estimate unbiased?

4.14 * Consider direct evaluation of the Wiener filter.

4.14.1. Show that the mean-square filtering error (4.78) can be evaluated to the form (4.79).

4.14.2. What is the minimum mean-square error given by the Wiener estimate?

4.15 The random variables

x

1,

x

2, and a third, related random variable

y

are jointly distributed. Define the random vector

z = [ y;x

1

;x

2

]

T

It is known that

z

has the mean vector

m

zand the covariance matrix

C

zgiven by

m

z

=

2

4

1 = 4 1 = 2 1 = 2

3

5

; C

z

= 110

2

4

7 1 1 1 3

?

1 1

?

1 3

3

5

Find the optimum linear mean-square estimate of

y

based on

x

1and

x

2.

4.16 * Assume that you know

T

data vectors

z (1) ; z (2) ;::: ; z ( T )

and their cor-responding desired responses

d (1) ;d (2) ;::: ;d ( T )

. Standard estimates of the corre-lation matrix and the cross-correcorre-lation vector needed in Wiener filtering are [ 172]

R ^

z

= 1 t

T

X

i=1

z ( i ) z ( i )

T

; ^ r

zd

= 1 T

T

X

i=1

z ( i ) d ( i )

(4.87)

4.16.1. Express the estimates (4.87) in matrix form and show that when they are used in the Wiener filter (4.80) instead of the true values, the filter coincides with a least-squares solution.

4.16.2. What is the discrete data model corresponding to this least-squares esti-mator?

4.17 * The joint density function of the random variables

x

and

y

is given by

p

xy

( x;y ) = 8 xy; 0

y

x

1 ;

and

p

xy

( x;y ) = 0

outside the region defined above.

4.17.1. Find and sketch the conditional density

p

yjx

( y

j

x )

.

4.17.2. Compute the MAP (maximum a posteriori) estimate of

y

.

4.17.3. Compute the optimal mean-square error estimate of

y

.

4.18 * Suppose that a scalar random variable

y

is of the form

y

=

z + v

, where

the pdf of

v

is

p

v

( t )

=

t= 2

on the interval

[0 ; 2]

, and the pdf of

z

is

p

z

( t )

=

2 t

on

the interval

[0 ; 1]

. Both the densities are zero elsewhere. There is available a single measurement value

y = 2 : 5

.

4.18.1. Compute the maximum likelihood estimate of

y

.

4.18.2. Compute the MAP (maximum a posteriori) estimate of

y

.

4.18.3. Compute the minimum mean-square estimate of

y

.

4.19 * Consider the MAP estimator (4.84) of the mean

.

4.19.1. Derive the estimator.

4.19.2. Express the estimator in recursive form.

Computer assignments

4.1 Choose a suitable set of two-dimensional data. Plenty of real-world data can be found for example using the links of the WWW page of this book, as well as in [376] and at the following Web sites:

http://ferret.wrc.noaa.gov/

http://www.ics.uci.edu/ mlearn/MLSummary.html

4.1.1. Plot the data (or part of it, if the data set is large).

4.1.2. Based on the plot, choose a suitable function (which is linear with respect to the parameters), and fit it to your data using the standard least-squares method.

(Alternatively, you can use nonlinear least-squares method if the parameters of the chosen function depend nonlinearly on the data.)

4.1.3. Plot the fitted curve and the fitting error. Assess the quality of your least-squares model.

4.2 * Use the Bayesian linear minimum mean-square estimator for predicting a scalar measurement from other measurements.

4.2.1. Choose first a suitable data set in which the components of the data vectors are correlated (see the previous computer assignment for finding data).

4.2.2. Compute the linear minimum mean-square estimator.

4.2.3. Compute the variance of the measurement that you have predicted and compare it with your minimum mean-square estimation (prediction) error.

5

Information Theory

Estimation theory gives one approach to characterizing random variables. This was based on building parametric models and describing the data by the parameters.

An alternative approach is given by information theory. Here the emphasis is on coding. We want to code the observations. The observations can then be stored in the memory of a computer, or transmitted by a communications channel, for example. Finding a suitable code depends on the statistical properties of the data.

In independent component analysis (ICA), estimation theory and information theory offer the two principal theoretical approaches.

In this chapter, the basic concepts of information theory are introduced. The latter half of the chapter deals with a more specialized topic: approximation of entropy.

These concepts are needed in the ICA methods of Part II.

No documento Independent Component Analysis (páginas 121-127)