THE MULTIVARIATE GAUSSIAN DENSITY

MATHEMATICAL PRELIMINARIES

2.5 THE MULTIVARIATE GAUSSIAN DENSITY

THE MULTIVARIATE GAUSSIAN DENSITY 31

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Fig. 2.5 The conditional probability den-sity

p

^yjx⁽

y

x

⁼¹

:

²⁷⁾^.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Fig. 2.6 The conditional probability den-sity

p

^xjy⁽

x

y

⁼^?0

:

³⁷⁾^.

Formula (2.64) (together with (2.65)) is called Bayes’ rule. This rule is important especially in statistical estimation theory. There typically

p

^xjy

( x

y )

is the conditional density of the measurement vector

x

^{, with}

y

denoting the vector of unknown random parameters. Bayes’ rule (2.64) allows the computation of the posterior density

p

^{y jx}

( y

x )

of the parameters

y

, given a specific measurement (observation) vector

x

and assuming or knowing the prior distribution

p

( y )

of the random parameters

y

These matters will be discussed in more detail in Chapter 4.

Conditional expectations are defined similarly to the expectations defined earlier, but the pdf appearing in the integral is now the appropriate conditional density. Hence, for example,

E^f

g ⁽ x ^; y ⁾

y

⁼

^Z ¹

g ⁽

^; y ⁾ ^p

^xjy

⁽

y ⁾ ^d

^(2.66)

This is still a function of the random vector

y

, which is thought to be nonrandom while computing the above expectation. The complete expectation with respect to both

x

^and

y

can be obtained by taking the expectation of (2.66) with respect to

y

E^f

g ⁽ x ^; y ⁾

⁼

^E^f^E^f

g ⁽ x ^; y ⁾

y

^gg ^(2.67)

Actually, this is just an alternative two-stage procedure for computing the expectation (2.28), following easily from Bayes’ rule.

Consider an

n

-dimensional random vector

x

. It is said to be gaussian if the probability density function of

x

has the form

p

( x ) = 1

(2 )

ⁿ⁼²

(det C

⁾

¹⁼²

^exp

1 2( x

m

)

C

^?1^x

( x

m

)

(2.68) Recall that

n

is the dimension of

x

m

^xits mean, and

C

^x the covariance matrix of

x

. The notation

det A

is used for the determinant of a matrix

A

, in this case

C

^x^{. It}

is easy to see that for a single random variable

x

⁽

n = 1

), the density (2.68) reduces to the one-dimensional gaussian pdf (2.4) discussed briefly in Example 2.1. Note also that the covariance matrix

C

^x is assumed strictly positive definite, which also implies that its inverse exists.

It can be shown that for the density (2.68)

E^f

x

⁼ m

^;

^E^f

⁽ x

m

⁾⁽ x

m

⁾

^T^g

⁼ C

^x ^(2.69)

Hence calling

m

^xthe mean vector and

C

^xthe covariance matrix of the multivariate gaussian density is justified.

2.5.1 Properties of the gaussian density

In the following, we list the most important properties of the multivariate gaussian density omitting proofs. The proofs can be found in many books; see, for example, [353, 419, 407].

Only first- and second-order statistics are needed Knowledge of the mean vector

m

^xand the covariance matrix

C

^x^of

x

are sufficient for defining the multi-variate gaussian density (2.68) completely. Therefore, all the higher-order moments must also depend only on

m

^x^and

C

^x. This implies that these moments do not carry any novel information about the gaussian distribution. An important consequence of this fact and the form of the gaussian pdf is that linear processing methods based on first- and second-order statistical information are usually optimal for gaussian data.

For example, independent component analysis does not bring out anything new com-pared with standard principal component analysis (to be discussed later) for gaussian data. Similarly, linear time-invariant discrete-time filters used in classic statistical signal processing are optimal for filtering gaussian data.

Linear transformations are gaussian If

x

is a gaussian random vector and

y

⁼

Ax

its linear transformation, then

y

is also gaussian with mean vector

m

^y ⁼

Am

^xand covariance matrix

C

^y⁼

AC

A

^T. A special case of this result says that any linear combination of gaussian random variables is itself gaussian. This result again has implications in standard independent component analysis: it is impossible to estimate the ICA model for gaussian data, that is, one cannot blindly separate

THE MULTIVARIATE GAUSSIAN DENSITY 33

gaussian sources from their mixtures without extra knowledge of the sources, as will be discussed in Chapter 7. ²

Marginal and conditional densities are gaussian Consider now two random vectors

x

^and

y

having dimensions

n

^and

m

, respectively. Let us collect them in a single random vector

z

^T ⁼

( x

; y

)

of dimension

n + m

. Its mean vector

m

^z^and

covariance matrix

C

^z^are

m

⁼

m

; C

⁼

C

^xy

C

^yx

C

(2.70) Recall that the cross-covariance matrices are transposes of each other:

C

^xy⁼

C

^T^yx^.

Assume now that

z

has a jointly gaussian distribution. It can be shown that the marginal densities

p

( x ⁾

^and

^p

⁽ y ⁾

of the joint gaussian density

p

( z ⁾

are gaussian.

Also the conditional densities

p

^xjy ^and

p

^{y jx} ^are

n

^{- and}

m

-dimensional gaussian densities, respectively. The mean and covariance matrix of the conditional density

p

^{y jx}^are

m

^{y jx}

= m

+ C

^yx

C

^?1^x

( x

m

)

^(2.71)

C

^{y jx}

⁼ C

^y^?

C

^yx

C

^?1^x

C

^xy ^(2.72)

Similar expressions are obtained for the mean

m

^xjyand covariance matrix

C

^xjy^of

the conditional density

p

^xjy^.

Uncorrelatedness and geometrical structure. We mentioned earlier that uncorrelated gaussian random variables are also independent, a property which is not shared by other distributions in general. Derivation of this important result is left to the reader as an exercise. If the covariance matrix

C

^xof the multivariate gaussian density (2.68) is not diagonal, the components of

x

are correlated. Since

C

^x^{is a}

symmetric and positive definite matrix, it can always be represented in the form

C

⁼ EDE

⁼

^Xⁿ

i⁼¹

e

Ti ^(2.73)

Here

E

is an orthogonal matrix (that is, a rotation) having as its columns

e

; e

;::: ; e

n the

n

eigenvectors of

C

^x^{, and}

D

^{= diag}

(

;

;::: ;

)

is the diagonal matrix con-taining the respective eigenvalues

i ^of

C

^x. Now it can readily be verified that applying the rotation

u = E

( x

m

)

^(2.74)

2It is possible, however, to separate temporally correlated (nonwhite) gaussian sources using their second-order temporal statistics on certain conditions. Such techniques are quite different from standard indepen-dent component analysis. They will be discussed in Chapter 18.

x

makes the components of the gaussian distribution of

u

uncorrelated, and hence also independent.

Moreover, the eigenvalues

i and eigenvectors

e

i of the covariance matrix

C

reveal the geometrical structure of the multivariate gaussian distribution. The con-tours of any pdf are defined by curves of constant values of the density, given by the equation

p

( x )

= constant. For the multivariate gaussian density, this is equivalent to requiring that the exponent is a constant

c

( x

m

⁾

C

^?1^x

⁽ x

m

^{) =} ^c

^(2.75)

Using (2.73), it is easy to see [419] that the contours of the multivariate gaussian are hyperellipsoids centered at the mean vector

m

^x^. The principal axes of the hyperellipsoids are parallel to the eigenvectors

e

i, and the eigenvalues

i ^{are the} respective variances. See Fig. 2.7 for an illustration.

λ1/2

1/2 2

e ₁

Fig. 2.7 Illustration of a multivariate gaussian probability density.

2.5.2 Central limit theorem

Still another argument underlining the significance of the gaussian distribution is provided by the central limit theorem. Let

x

=

^X^k

i⁼¹

z

i ^(2.76)

be a partial sum of a sequence^f

z

i^gof independent and identically distributed random variables

z

i. Since the mean and variance of

x

kcan grow without bound as

k

^!¹^,

consider instead of

x

kthe standardized variables

y

= x

k^?

m

xk (2.77)

DENSITY OF A TRANSFORMATION 35

where

m

xk and

xk are the mean and variance of

x

k^.

It can be shown that the distribution of

y

k converges to a gaussian distribution with zero mean and unit variance when

k

^!¹. This result is known as the central limit theorem. Several different forms of the theorem exist, where assumptions on independence and identical distributions have been weakened. The central limit theorem is a primary reason that justifies modeling of many random phenomena as gaussian random variables. For example, additive noise can often be considered to arise as a sum of a large number of small elementary effects, and is therefore naturally modeled as a gaussian random variable.

The central limit theorem generalizes readily to independent and identically dis-tributed random vectors

z

ihaving a common mean

m

^zand covariance matrix

C

^z^.

The limiting distribution of the random vector

y

= 1

k

i⁼¹

( z

i^?

m

⁾

^(2.78)

is multivariate gaussian with zero mean and covariance matrix

C

^z^.

The central limit theorem has important consequences in independent component analysis and blind source separation. A typical mixture, or component of the data vector

x

, is of the form

x

=

^X^m

j⁼¹

a

s

j ^(2.79)

where

a

ij^,

j = 1 ;::: ;m

, are constant mixing coefficients and

s

j^,

j = 1 ;::: ;m

are the

m

unknown source signals. Even for a fairly small number of sources (say,

m = 10

) the distribution of the mixture

x

kis usually close to gaussian. This seems to hold in practice even though the densities of the different sources are far from each other and far from gaussianity. Examples of this property can be found in Chapter 8, as well as in [149].

No documento Independent Component Analysis (páginas 53-57)

MATHEMATICAL PRELIMINARIES

2.5 THE MULTIVARIATE GAUSSIAN DENSITY

p

y

x

:

p

x

y

:

p

( x

y )

x

y

p

( y

x )

y

x

p

( y )

y

g ( x ; y )

y

=

g (

; y ) p

(

y ) d

y

x

y

y

g ( x ; y )

=

g ( x ; y )

y

n

x

x

p

( x ) = 1

(2 )

(det C

)

exp

1 2( x

m

)

C

( x

m

)

n

x

m

C

x

det A

A

C

x

n = 1

C

x

= m

;

( x

m

)( x

m

)

= C

m

C

m

C

x

g ⁽ x ^; y ⁾

⁼

g ⁽

^; y ⁾ ^p

⁽

y ⁾ ^d

g ⁽ x ^; y ⁾

⁼

g ⁽ x ^; y ⁾

⁾

^exp

⁼ m

^;

⁽ x

⁾⁽ x

⁾

⁼ C

⁼

⁼

( x ⁾

^p

⁽ y ⁾

( z ⁾

⁼ C

⁼ EDE

⁼