MATHEMATICAL PRELIMINARIES
4.7 CONCLUDING REMARKS AND REFERENCES
CONCLUDING REMARKS AND REFERENCES 99
difficult integrations needed in computing the minimum mean-square estimator. If the posterior distribution
p (
jx
T)
is symmetric around its peak value, the MAP estimator and MSE estimator coincide.There is no guarantee that the MAP estimator is unbiased. It is also generally difficult to compute the covariance matrix of the estimation error for the MAP and ML estimators. However, the MAP estimator is intuitively sensible, yields in most cases good results in practice, and it has good asymptotic properties under appropriate conditions. These desirable characteristics justify its use.
^
MAP
^
MMSE
^
ABS
^
p(
^
jx)
Fig. 4.2 A posterior density
p
( jx), and the respective MAP estimate^MAP, minimumMSE estimate
^MSE, and the minimum absolute error estimate^ABS.Regrettably, it is generally difficult to determine the posterior distribution in a form that allows for convenient mathematical analysis [407]. However, various advanced and approximative techniques have been developed to facilitate Bayesian estimation;
see [142]. When the number of measurements increases, the importance of prior information gradually decreases, and the maximum likelihood estimator becomes asymptotically optimal.
Finally, we point out that neural networks provide in many instances a useful practical tool for nonlinear estimation, even though they lie outside the range of classic estimation theory. For example, the well-known back-propagation algorithm [48, 172, 376] is in fact a stochastic gradient algorithm for minimizing the mean-square error criterion
EMSE
=
Efkd
?f (
; z )
k2g (4.86)Here
d
is the desired response vector andz
the (input) data vector. The parametersconsist of weights that are adjusted so that the mapping error (4.86) is minimized.
The nonlinear function
f (
; z )
has enough parameters and a flexible form, so that it can actually model with sufficient accuracy any regular nonlinear function. The back-propagation algorithm learns the parametersthat define the estimated input-output mappingf (
; z )
. See [48, 172, 376] for details and applications.PROBLEMS 101
Problems 4.1 Show that:
4.1.1. the maximum likelihood estimator of the variance (4.58) becomes unbiased if the estimated mean
^
MLis replaced in (4.58) by the true one.4.1.2. if the mean is estimated from the observations, one must use the formula (4.6) for getting an unbiased estimator.
4.2 Assume that
^
1 and^
2 are unbiased estimators of the parameter havingvariances var
(^
1) =
12, var(^
2) =
22.4.2.1. Show that for any scalar
0
1
, the estimator^
3=^
1+ (1
?)^
2is unbiased.
4.2.2. Determine the mean-square error of
^
3assuming that^
1and^
2are statis-tically independent.4.2.3. Find the value of
that minimizes this mean-square error.4.3 Let the scalar random variable
z
be uniformly distributed on the interval[0 ; )
.There exist
T
independent samplesz (1) ;::: ;z ( T )
fromz
. Using them, the estimate^
= max( z ( i ))
is constructed for the parameter.4.3.1. Compute the probability density function of
^
. (Hint: First construct the cumulative distribution function.)4.3.2. Is
^
unbiased or asymptotically unbiased?4.3.3. What is the mean-square error Ef
(^
?)
2jgof the estimate^
?4.4 Assume that you know
T
independent observations of a scalar quantity that is gaussian distributed with unknown meanand variance2. Estimateand2using the method of moments.
4.5 Assume that
x (1) ;x (2) ;::: ;x ( K )
are independent gaussian random variables having all the mean0
and variance2x. Then the sum of their squaresy =
XKj=1
[ x ( j )]
2is
2-distributed with the meanK
x2and variance2 K
4x. Estimate the parametersK
and2xusing the method of moments, assuming that there existT
measurementsy (1) ;y (2) ;::: ;y ( T )
on the sum of squaresy
.4.6 Derive the normal equations (4.37) for the least-squares criterion (4.36). Justify why these equations indeed provide the minimum of the criterion.
4.7 Assume that the measurement errors have zero mean: Ef
v
Tg=0
, and thatthe covariance matrix of the measurement errors is
C
v = Efv
Tv
TTg. Consider the properties of the least-squares estimator^
LSin (4.38).4.7.1. Show that the estimator
^
LSis unbiased.4.7.2. Compute the error covariance matrix
C
~defined in (4.19).4.7.3. Compute
C
~whenC
v=2
I
.4.8 Consider line fitting using the linear least-squares method. Assume that you know
T
measurementsx (1) ;x (2) ;::: ;x ( T )
on the scalar quantityx
made, respec-tively, at times (or argument values)t (1) ;t (2) ;::: ;t ( T )
. The task is to fit the linex =
0+
1t
to these measurements.
4.8.1. Construct the normal equations for this problem using the standard linear least-squares method.
4.8.2. Assume that the sampling interval
t
is constant and has been scaled so that the measurement times are integers1 ; 2 ;::: ;T
. Solve the normal equations in this important special case.4.9 * Consider the equivalence of the generalized least-squares and linear unbiased minimum mean-square estimators. Show that
4.9.1. The optimal solution minimizing the generalized least-squares criterion (4.45) is
^
WLS
= ( H
TWH )
?1H
TWx
T4.9.2. An unbiased linear mean-square estimator
^
MSE=Lx
T satisfies the con-ditionLH = I
.4.9.3. The mean-square error can be written in the form
EMSE
=
Efk?^
k2jg=
trace( LC
vL
T)
4.9.4. Minimization of the preceding criterionEMSEunder the constraint
LH =
I
leads to the BLUE estimator (4.46).4.10 For a fixed amount of gas, the following connection holds between the pressure
P
and the volumeV
:PV
= c;
where
andc
are constants. Assume that we knowT
pairs of measurements( P
i;V
i)
.We want to estimate the parameters
andc
using the linear least-squares method.Express the situation in the form of a matrix-vector model and explain how the estimates are computed (you need not compute the exact solution).
4.11 Let the probability density function of a scalar-valued random variable
z
bep ( z
j) =
2ze
?z; z
0 ; > 0
Determine the maximum likelihood estimate of the parameter
. There are availableT
independent measurementsz (1) ;::: ;z ( T )
onz
.4.12 In a signal processing application five sensors placed mutually according to a cross pattern yield, respectively, the measurements
x
0,x
1,x
2,x
3, andx
4, that canbe collected to the measurement vector
x
. The measurements are quantized with 7 bits accuracy so that their values are integers in the interval0 ;::: ; 127
. The jointPROBLEMS 103
density
p ( x
j)
of the measurements is a multinomial density that depends on the unknown parameteras follows:p ( x
j) = k ( x )(1 = 2)
x0( = 4)
x1(1 = 4
?= 4)
x2(1 = 4
?= 4)
x3( = 4)
x4where the scaling term
k ( x ) = ( x
0+ x
1+ x
2+ x
3+ x
4)!
x
0! x
1! x
2! x
3! x
4!
Determine the maximum likelihood estimate of the parameter
in terms of the measurement vectorx
. (Here, you can here treat the individual measurements in a similar manner as mutually independent scalar measurements.)4.13 Consider the sum
z
=x
1+ x
2+ ::: + x
K, where the scalar random variablesx
i are statistically independent and gaussian, each having the same mean0
and variance x2.4.13.1. Construct the maximum likelihood estimate for the number
K
of theterms in the sum.
4.13.2. Is this estimate unbiased?
4.14 * Consider direct evaluation of the Wiener filter.
4.14.1. Show that the mean-square filtering error (4.78) can be evaluated to the form (4.79).
4.14.2. What is the minimum mean-square error given by the Wiener estimate?
4.15 The random variables
x
1,x
2, and a third, related random variabley
are jointly distributed. Define the random vectorz = [ y;x
1;x
2]
TIt is known that
z
has the mean vectorm
zand the covariance matrixC
zgiven bym
z=
2
4
1 = 4 1 = 2 1 = 2
3
5
; C
z= 110
2
4
7 1 1 1 3
?1 1
?1 3
3
5
Find the optimum linear mean-square estimate of
y
based onx
1andx
2.4.16 * Assume that you know
T
data vectorsz (1) ; z (2) ;::: ; z ( T )
and their cor-responding desired responsesd (1) ;d (2) ;::: ;d ( T )
. Standard estimates of the corre-lation matrix and the cross-correcorre-lation vector needed in Wiener filtering are [ 172]R ^
z= 1 t
T
X
i=1
z ( i ) z ( i )
T; ^ r
zd= 1 T
T
X
i=1
z ( i ) d ( i )
(4.87)4.16.1. Express the estimates (4.87) in matrix form and show that when they are used in the Wiener filter (4.80) instead of the true values, the filter coincides with a least-squares solution.
4.16.2. What is the discrete data model corresponding to this least-squares esti-mator?
4.17 * The joint density function of the random variables
x
andy
is given byp
xy( x;y ) = 8 xy; 0
y
x
1 ;
and
p
xy( x;y ) = 0
outside the region defined above.4.17.1. Find and sketch the conditional density
p
yjx( y
jx )
.4.17.2. Compute the MAP (maximum a posteriori) estimate of
y
.4.17.3. Compute the optimal mean-square error estimate of
y
.4.18 * Suppose that a scalar random variable
y
is of the formy
=z + v
, wherethe pdf of
v
isp
v( t )
=t= 2
on the interval[0 ; 2]
, and the pdf ofz
isp
z( t )
=2 t
onthe interval
[0 ; 1]
. Both the densities are zero elsewhere. There is available a single measurement valuey = 2 : 5
.4.18.1. Compute the maximum likelihood estimate of
y
.4.18.2. Compute the MAP (maximum a posteriori) estimate of
y
.4.18.3. Compute the minimum mean-square estimate of
y
.4.19 * Consider the MAP estimator (4.84) of the mean
.4.19.1. Derive the estimator.
4.19.2. Express the estimator in recursive form.
Computer assignments
4.1 Choose a suitable set of two-dimensional data. Plenty of real-world data can be found for example using the links of the WWW page of this book, as well as in [376] and at the following Web sites:
http://ferret.wrc.noaa.gov/
http://www.ics.uci.edu/ mlearn/MLSummary.html
4.1.1. Plot the data (or part of it, if the data set is large).
4.1.2. Based on the plot, choose a suitable function (which is linear with respect to the parameters), and fit it to your data using the standard least-squares method.
(Alternatively, you can use nonlinear least-squares method if the parameters of the chosen function depend nonlinearly on the data.)
4.1.3. Plot the fitted curve and the fitting error. Assess the quality of your least-squares model.
4.2 * Use the Bayesian linear minimum mean-square estimator for predicting a scalar measurement from other measurements.
4.2.1. Choose first a suitable data set in which the components of the data vectors are correlated (see the previous computer assignment for finding data).
4.2.2. Compute the linear minimum mean-square estimator.
4.2.3. Compute the variance of the measurement that you have predicted and compare it with your minimum mean-square estimation (prediction) error.
5
Information Theory
Estimation theory gives one approach to characterizing random variables. This was based on building parametric models and describing the data by the parameters.
An alternative approach is given by information theory. Here the emphasis is on coding. We want to code the observations. The observations can then be stored in the memory of a computer, or transmitted by a communications channel, for example. Finding a suitable code depends on the statistical properties of the data.
In independent component analysis (ICA), estimation theory and information theory offer the two principal theoretical approaches.
In this chapter, the basic concepts of information theory are introduced. The latter half of the chapter deals with a more specialized topic: approximation of entropy.
These concepts are needed in the ICA methods of Part II.