MATHEMATICAL PRELIMINARIES
2.9 CONCLUDING REMARKS AND REFERENCES
CONCLUDING REMARKS AND REFERENCES 51
where
v ( n )
is a white noise process, anda
1;::: ;a
M are constant coefficients (pa-rameters) of the AR model. The model orderM
gives the number of previous samples on which the current valuex ( n )
of the AR process depends. The noise termv ( n )
introduces randomness into the model; without it the AR model would be completely deterministic. The coefficients
a
1;::: ;a
M of the AR model can be computed us-ing linear techniques from autocorrelation values estimated from the available data [419, 241, 169]. Since the AR models describe fairly well many natural stochastic processes, for example, speech signals, they are used in many applications. In ICA and BSS, they can be used to model the time correlations in each source processs
i( t )
. This sometimes improves greatly the performance of the algorithms.Autoregressive processes are a special case of autoregressive moving average (ARMA) processes described by the difference equation
x ( n ) +
XMi=1
a
ix ( n
?i ) = v ( n ) +
XNi=1
b
iv ( n
?i )
(2.127)Clearly, the AR model (2.126) is obtained from the ARMA model (2.127) when the moving average (MA) coefficients
b
1;::: ;b
N are all zero. On the other hand, if the AR coefficientsa
i are all zero, the ARMA process (2.127) reduces to a MA process of orderN
. The ARMA and MA models can also be used to describe stochastic processes. However, they are applied less frequently, because estimation of their parameters requires nonlinear techniques [241, 419, 411]. See the Appendix of Chapter 19 for a discussion of the stability of the ARMA model and its utilization in digital filtering.Problems
2.1 Derive a rule for computing the values of the cdf of the single variable gaussian (2.4) from the known tabulated values of the error function (2.5).
2.2 Let
x
1;x
2;::: ;x
K be independent, identically distributed samples from a distribution having a cumulative density functionF
x( x )
. Denote byy
1;x
2;::: ;y
K the sample setx
1;x
2;::: ;x
Kordered in increasing order.2.2.1. Show that the cdf and pdf of
y
K = maxfx
1;::: ;x
KgareF
yK( y
K) = [ F
x( y
K)]
Kp
yK( y
K) = K [ F
x( y
K)]
K?1p
x( y
K)
2.2.2. Derive the respective expressions for the cdf and pdf of the random variable
y
1= minfx
1;::: ;x
Kg.2.3 A two-dimensional random vector
x
=( x
1;x
2)
T has the probability density functionp
x( x ) =
(
1
2
( x
1+ 3 x
2) x
1;x
22[0 ; 1]
0
elsewhere2.3.1. Show that this probability density is appropriately normalized.
2.3.2. Compute the cdf of the random vector
x
.2.3.3. Compute the marginal distributions
p
x1( x
1)
andp
x2( x
2)
.2.4 Computer the mean, second moment, and variance of a random variable dis-tributed uniformly in the interval
[ a;b ]
(b > a
).2.5 Prove that expectations satisfy the linearity property (2.16).
2.6 Consider
n
scalar random variablesx
i,i = 1 ; 2 ;::: ;n
, having, respectively, the variances2xi. Show that if the random variablesx
iare mutually uncorrelated, the variancey2of their sumy
=Pni=1x
iequals the sum of the variances of thex
i: y2=
Xni=1
x2i2.7 Assume that
x
1 andx
2 are zero-mean, correlated random variables. Any orthogonal transformation ofx
1andx
2can be represented in the formy
1= cos( ) x
1+ sin( ) x
2y
2=
?sin( ) x
1+ cos( ) x
2where the parameter
defines the rotation angle of coordinate axes. Let Efx
21g=12,Ef
x
22g=22, and Efx
1x
2g=12. Find the anglefor whichy
1andy
2becomeuncorrelated.
PROBLEMS 53
2.8 Consider the joint probability density of the random vectors
x = ( x
1;x
2)
Tand
y = y
discussed in Example 2.6:p
x;y( x ; y ) =
(
( x
1+ 3 x
2) y x
1;x
22[0 ; 1] ; y
2[0 ; 1]
0
elsewhere2.8.1. Compute the marginal distributions
p
x( x )
,p
y( y )
,p
x1( x
1)
, andp
x2( x
2)
.2.8.2. Verify that the claims made on the independence of
x
1,x
2, andy
inExample 2.6 hold.
2.9 Which conditions should the elements of the matrix
R =
a b c d
satisfy so that
R
could be a valid autocorrelation matrix of 2.9.1. A two-dimensional random vector?2.9.2. A stationary scalar-valued stochastic process?
2.10 Show that correlation and covariance matrices satisfy the relationships (2.26) and (2.32).
2.11 Work out Example 2.5 for the covariance matrix
C
xofx
, showing that similar results are obtained. Are the assumptions required the same?2.12 Assume that the inverse
R
?1x of the correlation matrix of then
-dimensional column random vectorx
exists. Show thatEf
x
TR
?1xx
g= n
2.13 Consider a two-dimensional gaussian random vector
x
with mean vectorm
x=
(2 ; 1)
T and covariance matrixC
x=
?2 1 2
?1
2.13.1. Find the eigenvalues and eigenvectors of
C
x.2.13.2. Draw a contour plot of the gaussian density similar to Figure 2.7.
2.14 Repeat the previous problem for a gaussian random vector
x
that has the mean vectorm
x=(
?2 ; 3)
T and covariance matrixC
x=
?2 2 5
?2
2.15 Assume that random variables
x
andy
are linear combinations of two uncor-related gaussian random variablesu
andv
, defined byx = 3 u
?4 v
y = 2 u + v
Assume that the mean values and variances of both
u
andv
equal 1.2.15.1. Determine the mean values of
x
andy
.2.15.2. Find the variances of
x
andy
.2.15.3. Form the joint density function of
x
andy
.2.15.4. Find the conditional density of
y
givenx
.2.16 Show that the skewness of a random variable having a symmetric pdf is zero.
2.17 Show that the kurtosis of a gaussian random variable is zero.
2.18 Show that random variables having
2.18.1. A uniform distribution in the interval
[
?a;a ] ;a > 0
, are subgaussian.2.18.2. A Laplacian distribution are supergaussian.
2.19 The exponential density has the pdf
p
x( x ) =
(
exp(
?x ) x
0 0 x < 0
where
is a positive constant.2.19.1. Compute the first characteristic function of the exponential distribution.
2.19.2. Using the characteristic function, determine the moments of the exponen-tial density.
2.20 A scalar random variable
x
has a gamma distribution if its pdf is given byp
x( x ) =
(
x
b?1exp(
?cx ) x
0
0 x < 0
where
b
andc
are positive numbers and the parameter= c
b?( b )
is defined by the gamma function
?( b + 1) =
Z
1
0
y
bexp(
?y ) dy; b >
?1
The gamma function satisfies the generalized factorial condition
?( b + 1)
=b ?( b )
.For integer values, this becomes
?( n + 1)
=n !
.2.20.1. Show that if
b = 1
, the gamma distribution reduces to the standard exponential density.2.20.2. Show that the first characteristic function of a gamma distributed random variable is
' ( ! ) = c
b( c
?|! )
bPROBLEMS 55
2.20.3. Using the previous result, determine the mean, second moment, and variance of the gamma distribution.
2.21 Let
k( x )
andk( y )
be thek
th-order cumulants of the scalar random variablesx
andy
, respectively.2.21.1. Show that if
x
andy
are independent, then k( x + y ) =
k( x ) +
k( y )
2.21.2. Show that
k( x )
=kk( x )
, whereis a constant.2.22 * Show that the power spectrum
S
x( ! )
is a real-valued, even, and periodic function of the angular frequency!
.2.23 * Consider the stochastic process
y ( n ) = x ( n + k )
?x ( n
?k )
where
k
is a constant integer andx ( n )
is a zero mean, wide-sense stationary stochastic process. Let the power spectrum ofx ( n )
beS
x( ! )
and its autocorrelation sequencer
x(0) ;r
x(1) ;:::
.2.23.1. Determine the autocorrelation sequence
r
y( m )
of the processy ( n )
.2.23.2. Show that the power spectrum of
y ( n )
isS
y( ! ) = 4 S
x( ! )sin
2( k! )
2.24 * Consider the autoregressive process (2.126).
2.24.1. Show that the autocorrelation function of the AR process satisfies the difference equation
r
x( l ) =
?XMi=1
a
ir
x( l
?i ) ; l > 0
2.24.2. Using this result, show that the AR coefficients
a
ican be determined from the Yule-Walker equationsR
xa =
?r
xHere the autocorrelation matrix
R
xdefined in (2.119) has the valuem
=M
?1
, thevector
r
x= [ r
x(1) ;r
x(2) ;::: ;r
x( M )]
Tand the coefficient vector
a = [ a
1;a
2;::: ;a
M]
T2.24.3. Show that the variance of the white noise process
v ( n )
in (2.126) is related to the autocorrelation values by the formula 2v= r
x(0) +
XMi=1
a
ir
x( i )
Computer assignments
2.1 Generate samples of a two-dimensional gaussian random vector
x
having zero mean vector and the covariance matrixC
x=
?4 1 2
?1
Estimate the covariance matrix and compare it with the theoretical one for the fol-lowing numbers of samples, plotting the sample vectors in each case.
2.1.1.
K = 20
.2.1.2.
K = 200
.2.1.3.
K = 2000
.2.2 Consider generation of desired Laplacian random variables for simulation pur-poses.
2.2.1. Using the probability integral transformation, give a formula for generating samples of a scalar random variable with a desired Laplacian distribution from uniformly distributed samples.
2.2.2. Extend the preceding procedure so that you get samples of two Laplacian random variables with a desired mean vector and joint covariance matrix. (Hint: Use the eigenvector decomposition of the covariance matrix for generating the desired covariance matrix.)
2.2.3. Use your procedure for generating 200 samples of a two-dimensional Laplacian random variable
x
with a mean vectorm
x =(2 ;
?1)
T and covariance matrixC
x=
?4 1 2
?1
Plot the generated samples.
2.3 * Consider the second-order autoregressive model described by the difference equation
x ( n ) + a
1x ( n
?1) + a
2x ( n
?2) = v ( n )
Here
x ( n )
is the value of the process at timen
, andv ( n )
is zero mean white gaussian noise with variance2v that “drives” the AR process. Generate 200 samples of the process using the initial valuesx (0)
=x (
?1)
= 0 and the following coefficient values.Plot the resulting AR process in each case.
2.3.1.
a
1=
?0 : 1
anda
2=
?0 : 8
.2.3.2.
a
1= 0 : 1
anda
2=
?0 : 8
.2.3.3.
a
1=
?0 : 975
anda
2= 0 : 95
.2.3.4.
a
1= 0 : 1
anda
2=
?1 : 0
.3
Gradients and Optimization Methods
The main task in the independent component analysis (ICA) problem, formulated in Chapter 1, is to estimate a separating matrix
W
that will give us the independent components. It also became clear thatW
cannot generally be solved in closed form, that is, we cannot write it as some function of the sample or training set, whose value could be directly evaluated. Instead, the solution method is based on cost functions, also called objective functions or contrast functions. SolutionsW
to ICA are found at the minima or maxima of these functions. Several possible ICA cost functions will be given and discussed in detail in Parts II and III of this book. In general, statistical estimation is largely based on optimization of cost or objective functions, as will be seen in Chapter 4.Minimization of multivariate functions, possibly under some constraints on the solutions, is the subject of optimization theory. In this chapter, we discuss some typical iterative optimization algorithms and their properties. Mostly, the algorithms are based on the gradients of the cost functions. Therefore, vector and matrix gradients are reviewed first, followed by the most typical ways to solve unconstrained and constrained optimization problems with gradient-type learning algorithms.
3.1 VECTOR AND MATRIX GRADIENTS