• Nenhum resultado encontrado

ICA IS STRONGER THAT WHITENING

No documento Independent Component Analysis (páginas 180-183)

BASIC INDEPENDENT COMPONENT ANALYSIS

7.4 ICA IS STRONGER THAT WHITENING

Fig. 7.9 The joint distribution of the observed mixtures

x

1and

x

2, obtained from super-gaussian independent components. Horizontal axis:

x

1, vertical axis:

x

2.

What we need is a method that works for any distributions of the independent components, and works fast and reliably. Such methods are the main subject of this book, and will be presented in Chapters 8–12. In the rest of this chapter, however, we discuss the connection between ICA and whitening.

ICA IS STRONGER THAT WHITENING 159

In this book, all random variables are assumed to have zero mean, unless otherwise mentioned. Thus, covariance is equal to correlation corr

( y

1

;y

2

) = E

f

y

1

y

2g, and

uncorrelatedness is the same thing as zero correlation (see Section 2.2).1

If random variables are independent, they are uncorrelated. This is because if the

y

1and

y

2are independent, then for any two functions,

h

1and

h

2, we have

E

f

h

1

( y

1

) h

2

( y

2

)

g

= E

f

h

1

( y

1

)

g

E

f

h

2

( y

2

)

g (7.15)

see Section 2.3. Taking

h

1

( y

1

) = y

1 and

h

2

( y

2

) = y

2, we see that this implies uncorrelatedness.

On the other hand, uncorrelatedness does not imply independence. For example, assume that

( y

1

;y

2

)

are discrete valued and follow such a distribution that the pair are with probability

1 = 4

equal to any of the following values:

(0 ; 1) ; (0 ;

?

1) ; (1 ; 0) ;

and

(

?

1 ; 0)

. Then

y

1and

y

2are uncorrelated, as can be simply calculated. On the other hand,

E

f

y

12

y

22g

= 0

6

= 14 = E

f

y

21g

E

f

y

22g (7.16)

so the condition in Eq. (7.15) is violated, and the variables cannot be independent.

A slightly stronger property than uncorrelatedness is whiteness. Whiteness of a zero-mean random vector, say

y

, means that its components are uncorrelated and their variances equal unity. In other words, the covariance matrix (as well as the correlation matrix) of

y

equals the identity matrix:

E

f

yy

Tg

= I

(7.17)

Consequently, whitening means that we linearly transform the observed data vector

x

by linearly multiplying it with some matrix

V

z = Vx

(7.18)

so that we obtain a new vector

z

that is white. Whitening is sometimes called sphering.

A whitening transformation is always possible. Some methods were reviewed in Chapter 6. One popular method for whitening is to use the eigenvalue decomposition (EVD) of the covariance matrix

E

f

xx

Tg

= EDE

T (7.19)

where

E

is the orthogonal matrix of eigenvectors of

E

f

xx

Tgand

D

is the diagonal matrix of its eigenvalues,

D =

diag

( d

1

;:::;d

n

)

. Whitening can now be done by the whitening matrix

V = ED

?1=2

E

T (7.20)

1In statistical literature, correlation is often defined as a normalized version of covariance. Here, we use this simpler definition that is more widely spread in signal processing. In any case, the concept of uncorrelatedness is the same.

where the matrix

D

?1=2 is computed by a simple componentwise operation as

D

?1=2

=

diag

( d

?11 =2

;:::;d

?1n =2

)

. A whitening matrix computed this way is denoted by

E

f

xx

Tg?1=2or

C

?1=2. Alternatively, whitening can be performed in connection with principal component analysis, which gives a related whitening matrix. For details, see Chapter 6.

7.4.2 Whitening is only half ICA

Now, suppose that the data in the ICA model is whitened, for example, by the matrix given in (7.20). Whitening transforms the mixing matrix into a new one,

A ~

. We have

from (7.5) and (7.18)

z = VAs = ~ As

(7.21)

One could hope that whitening solves the ICA problem, since whiteness or uncor-relatedness is related to independence. This is, however, not so. Uncoruncor-relatedness is weaker than independence, and is not in itself sufficient for estimation of the ICA model. To see this, consider an orthogonal transformation

U

of

z

:

y = Uz

(7.22)

Due to the orthogonality of

U

, we have

E

f

yy

Tg

= E

f

Uyy

T

U

Tg

= UIU

T

= I

(7.23)

In other words,

y

is white as well. Thus, we cannot tell if the independent components are given by

z

or

y

using the whiteness property alone. Since

y

could be any orthogonal transformation of

z

, whitening gives the ICs only up to an orthogonal transformation. This is not sufficient in most applications.

On the other hand, whitening is useful as a preprocessing step in ICA. The utility of whitening resides in the fact that the new mixing matrix

A ~ = VA

is orthogonal.

This can be seen from

E

f

zz

Tg

= ~ A E

f

ss

Tg

A ~

T

= ~ A A ~

T

= I

(7.24)

This means that we can restrict our search for the mixing matrix to the space of orthogonal matrices. Instead of having to estimate the

n

2 parameters that are the elements of the original matrix

A

, we only need to estimate an orthogonal mixing matrix

A ~

. An orthogonal matrix contains

n ( n

?

1) = 2

degrees of freedom. For example, in two dimensions, an orthogonal transformation is determined by a single angle parameter. In larger dimensions, an orthogonal matrix contains only about half of the number of parameters of an arbitrary matrix.

Thus one can say that whitening solves half of the problem of ICA. Because whitening is a very simple and standard procedure, much simpler than any ICA algorithms, it is a good idea to reduce the complexity of the problem this way. The remaining half of the parameters has to be estimated by some other method; several will be introduced in the next chapters.

WHY GAUSSIAN VARIABLES ARE FORBIDDEN 161

Fig. 7.10 The joint distribution of the whitened mixtures of uniformly distributed indepen-dent components.

A graphical illustration of the effect of whitening can be seen in Fig. 7.10, in which the data in Fig. 7.6 has been whitened. The square defining the distribution is now clearly a rotated version of the original square in Fig. 7.10. All that is left is the estimation of a single angle that gives the rotation.

In many chapters of this book, we assume that the data has been preprocessed by whitening, in which case we denote the data by

z

. Even in cases where whitening is not explicitly required, it is recommended, since it reduces the number of free parameters and considerably increases the performance of the methods, especially with high-dimensional data.

No documento Independent Component Analysis (páginas 180-183)