MATHEMATICAL PRELIMINARIES
2.5 THE MULTIVARIATE GAUSSIAN DENSITY
THE MULTIVARIATE GAUSSIAN DENSITY 31
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Fig. 2.5 The conditional probability den-sity
p
yjx(y
jx
=1:
27).−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Fig. 2.6 The conditional probability den-sity
p
xjy(x
jy
=?0:
37).Formula (2.64) (together with (2.65)) is called Bayes’ rule. This rule is important especially in statistical estimation theory. There typically
p
xjy( x
jy )
is the conditional density of the measurement vectorx
, withy
denoting the vector of unknown random parameters. Bayes’ rule (2.64) allows the computation of the posterior densityp
y jx( y
jx )
of the parametersy
, given a specific measurement (observation) vectorx
,and assuming or knowing the prior distribution
p
y( y )
of the random parametersy
.These matters will be discussed in more detail in Chapter 4.
Conditional expectations are defined similarly to the expectations defined earlier, but the pdf appearing in the integral is now the appropriate conditional density. Hence, for example,
Ef
g ( x ; y )
jy
g=
Z 1?1
g (
; y ) p
xjy(
jy ) d
(2.66)This is still a function of the random vector
y
, which is thought to be nonrandom while computing the above expectation. The complete expectation with respect to bothx
andy
can be obtained by taking the expectation of (2.66) with respect toy
:Ef
g ( x ; y )
g=
EfEfg ( x ; y )
jy
gg (2.67)Actually, this is just an alternative two-stage procedure for computing the expectation (2.28), following easily from Bayes’ rule.
Consider an
n
-dimensional random vectorx
. It is said to be gaussian if the probability density function ofx
has the formp
x( x ) = 1
(2 )
n=2(det C
x)
1=2exp
?
1 2( x
?m
x)
TC
?1x( x
?m
x)
(2.68) Recall that
n
is the dimension ofx
,m
xits mean, andC
x the covariance matrix ofx
. The notationdet A
is used for the determinant of a matrixA
, in this caseC
x. Itis easy to see that for a single random variable
x
(n = 1
), the density (2.68) reduces to the one-dimensional gaussian pdf (2.4) discussed briefly in Example 2.1. Note also that the covariance matrixC
x is assumed strictly positive definite, which also implies that its inverse exists.It can be shown that for the density (2.68)
Ef
x
g= m
x;
Ef( x
?m
x)( x
?m
x)
Tg= C
x (2.69)Hence calling
m
xthe mean vector andC
xthe covariance matrix of the multivariate gaussian density is justified.2.5.1 Properties of the gaussian density
In the following, we list the most important properties of the multivariate gaussian density omitting proofs. The proofs can be found in many books; see, for example, [353, 419, 407].
Only first- and second-order statistics are needed Knowledge of the mean vector
m
xand the covariance matrixC
xofx
are sufficient for defining the multi-variate gaussian density (2.68) completely. Therefore, all the higher-order moments must also depend only onm
xandC
x. This implies that these moments do not carry any novel information about the gaussian distribution. An important consequence of this fact and the form of the gaussian pdf is that linear processing methods based on first- and second-order statistical information are usually optimal for gaussian data.For example, independent component analysis does not bring out anything new com-pared with standard principal component analysis (to be discussed later) for gaussian data. Similarly, linear time-invariant discrete-time filters used in classic statistical signal processing are optimal for filtering gaussian data.
Linear transformations are gaussian If
x
is a gaussian random vector andy
=Ax
its linear transformation, theny
is also gaussian with mean vectorm
y =Am
xand covariance matrixC
y=AC
xA
T. A special case of this result says that any linear combination of gaussian random variables is itself gaussian. This result again has implications in standard independent component analysis: it is impossible to estimate the ICA model for gaussian data, that is, one cannot blindly separateTHE MULTIVARIATE GAUSSIAN DENSITY 33
gaussian sources from their mixtures without extra knowledge of the sources, as will be discussed in Chapter 7. 2
Marginal and conditional densities are gaussian Consider now two random vectors
x
andy
having dimensionsn
andm
, respectively. Let us collect them in a single random vectorz
T =( x
T; y
T)
of dimensionn + m
. Its mean vectorm
zandcovariance matrix
C
zarem
z=
m
xm
y
; C
z=
C
xC
xyC
yxC
y
(2.70) Recall that the cross-covariance matrices are transposes of each other:
C
xy=C
Tyx.Assume now that
z
has a jointly gaussian distribution. It can be shown that the marginal densitiesp
x( x )
andp
y( y )
of the joint gaussian densityp
z( z )
are gaussian.Also the conditional densities
p
xjy andp
y jx aren
- andm
-dimensional gaussian densities, respectively. The mean and covariance matrix of the conditional densityp
y jxarem
y jx= m
y+ C
yxC
?1x( x
?m
x)
(2.71)C
y jx= C
y?C
yxC
?1xC
xy (2.72)Similar expressions are obtained for the mean
m
xjyand covariance matrixC
xjyofthe conditional density
p
xjy.Uncorrelatedness and geometrical structure. We mentioned earlier that uncorrelated gaussian random variables are also independent, a property which is not shared by other distributions in general. Derivation of this important result is left to the reader as an exercise. If the covariance matrix
C
xof the multivariate gaussian density (2.68) is not diagonal, the components ofx
are correlated. SinceC
xis asymmetric and positive definite matrix, it can always be represented in the form
C
x= EDE
T=
Xni=1
ie
ie
Ti (2.73)Here
E
is an orthogonal matrix (that is, a rotation) having as its columnse
1; e
2;::: ; e
n then
eigenvectors ofC
x, andD
= diag(
1;
2;::: ;
n)
is the diagonal matrix con-taining the respective eigenvalues i ofC
x. Now it can readily be verified that applying the rotationu = E
T( x
?m
x)
(2.74)2It is possible, however, to separate temporally correlated (nonwhite) gaussian sources using their second-order temporal statistics on certain conditions. Such techniques are quite different from standard indepen-dent component analysis. They will be discussed in Chapter 18.
to
x
makes the components of the gaussian distribution ofu
uncorrelated, and hence also independent.Moreover, the eigenvalues
i and eigenvectorse
i of the covariance matrixC
xreveal the geometrical structure of the multivariate gaussian distribution. The con-tours of any pdf are defined by curves of constant values of the density, given by the equation
p
x( x )
= constant. For the multivariate gaussian density, this is equivalent to requiring that the exponent is a constantc
:( x
?m
x)
TC
?1x( x
?m
x) = c
(2.75)Using (2.73), it is easy to see [419] that the contours of the multivariate gaussian are hyperellipsoids centered at the mean vector
m
x. The principal axes of the hyperellipsoids are parallel to the eigenvectorse
i, and the eigenvaluesi are the respective variances. See Fig. 2.7 for an illustration.λ1/2
1/2 2
2
x2
m
e
e 1
1
1
λ
x
Fig. 2.7 Illustration of a multivariate gaussian probability density.
2.5.2 Central limit theorem
Still another argument underlining the significance of the gaussian distribution is provided by the central limit theorem. Let
x
k=
Xki=1
z
i (2.76)be a partial sum of a sequencef
z
igof independent and identically distributed random variablesz
i. Since the mean and variance ofx
kcan grow without bound ask
!1,consider instead of
x
kthe standardized variablesy
k= x
k?m
xk xk (2.77)DENSITY OF A TRANSFORMATION 35
where
m
xk andxk are the mean and variance ofx
k.It can be shown that the distribution of
y
k converges to a gaussian distribution with zero mean and unit variance whenk
!1. This result is known as the central limit theorem. Several different forms of the theorem exist, where assumptions on independence and identical distributions have been weakened. The central limit theorem is a primary reason that justifies modeling of many random phenomena as gaussian random variables. For example, additive noise can often be considered to arise as a sum of a large number of small elementary effects, and is therefore naturally modeled as a gaussian random variable.The central limit theorem generalizes readily to independent and identically dis-tributed random vectors
z
ihaving a common meanm
zand covariance matrixC
z.The limiting distribution of the random vector
y
k= 1
pk
k
X
i=1
( z
i?m
z)
(2.78)is multivariate gaussian with zero mean and covariance matrix
C
z.The central limit theorem has important consequences in independent component analysis and blind source separation. A typical mixture, or component of the data vector
x
, is of the formx
i=
Xmj=1
a
ijs
j (2.79)where
a
ij,j = 1 ;::: ;m
, are constant mixing coefficients ands
j,j = 1 ;::: ;m
,are the