MATHEMATICAL PRELIMINARIES
2.2 EXPECTATIONS AND MOMENTS .1 Definition and general properties
EXPECTATIONS AND MOMENTS 19
a "supervector"
z
T =( x
T; y
T)
, and the preceding formulas used directly. The cdf that arises is called the joint distribution function ofx
andy
, and is given byF
x;y( x
0; y
0) = P ( x
x
0; y
y
0)
(2.11)Here
x
0andy
0are some constant vectors having the same dimensions asx
andy
,respectively, and Eq. (2.11) defines the joint probability of the event
x
x
0 andy
y
0.The joint density function
p
x;y( x ; y )
ofx
andy
is again defined formally by dif-ferentiating the joint distribution functionF
x;y( x ; y )
with respect to all components of the random vectorsx
andy
. Hence, the relationshipF
x;y( x
0; y
0) =
Z
x0
?1 Z
y0
?1
p
x;y(
;
) d
d
(2.12)holds, and the value of this integral equals unity when both
x
0!1andy
0!1.The marginal densities
p
x( x )
ofx
andp
y( y )
ofy
are obtained by integrating over the other random vector in their joint densityp
x;y( x ; y )
:p
x( x ) =
Z 1?1
p
x;y( x ;
) d
(2.13)p
y( y ) =
Z
1
?1
p
x;y(
; y ) d
(2.14)Example 2.3 Consider the joint density given in Example 2.2. The marginal densi-ties of the random variables
x
andy
arep
x( x ) =
Z
1
0
3 7(2
?x )( x + y ) dy; x
2[0 ; 2]
=
(
3
7
(1 +
32x
?x
2) x
2[0 ; 2]
0
elsewherep
y( y ) =
Z 20
3 7(2
?x )( x + y ) dx; y
2[0 ; 1]
=
(
2
7
(2 + 3 y ) ; y
2[0 ; 1]
0 ;
elsewhere2.2 EXPECTATIONS AND MOMENTS
functions of that random variable for performing useful analyses and processing. A great advantage of expectations is that they can be estimated directly from the data, even though they are formally defined in terms of the density function.
Let
g ( x )
denote any quantity derived from the random vectorx
. The quantityg ( x )
may be either a scalar, vector, or even a matrix. The expectation ofg ( x )
isdenoted by Ef
g ( x )
g, and is defined by Efg ( x )
g=
Z 1?1
g ( x ) p
x( x ) d x
(2.15)Here the integral is computed over all the components of
x
.The integration operation is applied separately to every component of the vector or element of the matrix, yielding as a result another vector or matrix of the same size. Ifg ( x )
=x
, we get the expectation Efx
gofx
; this is discussed in more detail in the next subsection.Expectations have some important fundamental properties.
1. Linearity. Let
x
i,i = 1 ;::: ;m
be a set of different random vectors, anda
i,i = 1 ;::: ;m
, some nonrandom scalar coefficients. ThenEf m
X
i=1
a
ix
ig=
Xmi=1
a
iEfx
ig (2.16) 2. Linear transformation. Letx
be anm
-dimensional random vector, andA
andB
some nonrandomk
m
andm
l
matrices, respectively. ThenEf
Ax
g= A
Efx
g;
EfxB
g=
Efx
gB
(2.17)3. Transformation invariance. Let
y = g ( x )
be a vector-valued function of the random vectorx
. ThenZ
1
?1
y p
y( y ) d y =
Z
1
?1
g ( x ) p
x( x ) d x
(2.18)Thus Ef
y
g = Efg ( x )
g, even though the integrations are carried out over different probability density functions.These properties can be proved using the definition of the expectation operator and properties of probability density functions. They are important and very helpful in practice, allowing expressions containing expectations to be simplified without actually needing to compute any integrals (except for possibly in the last phase).
2.2.2 Mean vector and correlation matrix
Moments of a random vector
x
are typical expectations used to characterize it. They are obtained wheng ( x )
consists of products of components ofx
. In particular, theEXPECTATIONS AND MOMENTS 21
first moment of a random vector
x
is called the mean vectorm
xofx
. It is defined as the expectation ofx
:m
x=
Efx
g=
Z 1?1
x p
x( x ) d x
(2.19)Each component
m
xi of then
-vectorm
xis given bym
xi=
Efx
ig=
Z
1
?1
x
ip
x( x ) d x =
Z
1
?1
x
ip
xi( x
i) dx
i (2.20) wherep
xi( x
i)
is the marginal density of thei
th componentx
iofx
. This is because integrals over all the other components ofx
reduce to unity due to the definition of the marginal density.Another important set of moments consists of correlations between pairs of com-ponents of
x
. The correlationr
ij between thei
th andj
th component ofx
is givenby the second moment
r
ij=
Efx
ix
jg=
Z
1
?1
x
ix
jp
x( x ) d x =
Z 1?1 Z
1
?1
x
ix
jp
xi;xj( x
i;x
j) dx
jdx
i (2.21) Note that correlation can be negative or positive.The
n
n
correlation matrixR
x=
Efxx
Tg (2.22)of the vector
x
represents in a convenient form all its correlations,r
ij being the element in rowi
and columnj
ofR
x.The correlation matrix
R
xhas some important properties:1. It is a symmetric matrix:
R
x=R
Tx.2. It is positive semidefinite:
a
TR
xa
0
(2.23)for all
n
-vectorsa
. Usually in practiceR
x is positive definite, meaning that for any vectora
6= 0
, (2.23) holds as a strict inequality.3. All the eigenvalues of
R
xare real and nonnegative (positive ifR
xis positive definite). Furthermore, all the eigenvectors ofR
xare real, and can always be chosen so that they are mutually orthonormal.Higher-order moments can be defined analogously, but their discussion is post-poned to Section 2.7. Instead, we shall first consider the corresponding central and second-order moments for two different random vectors.
2.2.3 Covariances and joint moments
Central moments are defined in a similar fashion to usual moments, but the mean vectors of the random vectors involved are subtracted prior to computing the ex-pectation. Clearly, central moments are only meaningful above the first order. The quantity corresponding to the correlation matrix
R
xis called the covariance matrixC
xofx
, and is given byC
x=
Ef( x
?m
x)( x
?m
x)
Tg (2.24)The elements
c
ij=
Ef( x
i?m
i)( x
j?m
j)
g (2.25)of the
n
n
matrixC
x are called covariances, and they are the central moments corresponding to the correlations1r
ij defined in Eq. (2.21).The covariance matrix
C
xsatisfies the same properties as the correlation matrixR
x. Using the properties of the expectation operator, it is easy to see thatR
x= C
x+ m
xm
Tx (2.26)If the mean vector
m
x= 0
, the correlation and covariance matrices become the same. If necessary, the data can easily be made zero mean by subtracting the (estimated) mean vector from the data vectors as a preprocessing step. This is a usual practice in independent component analysis, and thus in later chapters, we simply denote byC
xthe correlation/covariance matrix, often even dropping the subscriptx
for simplicity.
For a single random variable
x
, the mean vector reduces to its mean valuem
x= Efx
g, the correlation matrix to the second moment Efx
2g, and the covariance matrix to the variance ofx
2x=
Ef( x
?m
x)
2g (2.27)The relationship (2.26) then takes the simple form Ef
x
2g=x2+ m
2x.The expectation operation can be extended for functions
g ( x ; y )
of two different random vectorsx
andy
in terms of their joint density:Ef
g ( x ; y )
g=
Z 1?1 Z
1
?1
g ( x ; y ) p
x;y( x ; y ) d y d x
(2.28)The integrals are computed over all the components of
x
andy
.Of the joint expectations, the most widely used are the cross-correlation matrix
R
xy=
Efxy
Tg (2.29)1In classic statistics, the correlation coefficientsij= cij
(ciicjj)1=2 are used, and the matrix consisting of them is called the correlation matrix. In this book, the correlation matrix is defined by the formula (2.22), which is a common practice in signal processing, neural networks, and engineering.
EXPECTATIONS AND MOMENTS 23
−5 −4 −3 −2 −1 0 1 2 3 4 5
−5
−4
−3
−2
−1 0 1 2 3 4 5
y
x
Fig. 2.2 An example of negative covariance between the random variables
x
andy
.−5 −4 −3 −2 −1 0 1 2 3 4 5
−5
−4
−3
−2
−1 0 1 2 3 4 5
y
x
Fig. 2.3 An example of zero covariance be-tween the random variables
x
andy
.and the cross-covariance matrix
C
xy=
Ef( x
?m
x)( y
?m
y)
Tg (2.30)Note that the dimensions of the vectors
x
andy
can be different. Hence, the cross-correlation and -covariance matrices are not necessarily square matrices, and they are not symmetric in general. However, from their definitions it follows easily thatR
xy= R
Tyx; C
xy= C
Tyx (2.31)If the mean vectors of
x
andy
are zero, the cross-correlation and cross-covariance matrices become the same. The covariance matrixC
x+yof the sum of two random vectorsx
andy
having the same dimension is often needed in practice. It is easy to see thatC
x+y= C
x+ C
xy+ C
yx+ C
y (2.32)Correlations and covariances measure the dependence between the random vari-ables using their second-order statistics. This is illustrated by the following example.
Example 2.4 Consider the two different joint distributions
p
x;y( x;y )
of the zero mean scalar random variablesx
andy
shown in Figs. 2.2 and 2.3. In Fig. 2.2,x
and
y
have a clear negative covariance (or correlation). A positive value ofx
mostlyimplies that
y
is negative, and vice versa. On the other hand, in the case of Fig. 2.3, it is not possible to infer anything about the value ofy
by observingx
. Hence, their covariancec
xy0
.2.2.4 Estimation of expectations
Usually the probability density of a random vector
x
is not known, but there is often available a set ofK
samplesx
1; x
2;::: ; x
K fromx
. Using them, the expectation (2.15) can be estimated by averaging over the sample using the formula [419]Ef
g ( x )
g1 K
K
X
j=1
g ( x
j)
(2.33)For example, applying (2.33), we get for the mean vector
m
x ofx
its standard estimator, the sample meanm ^
x= 1 K
XKj=1
x
j (2.34)where the hat over
m
is a standard notation for an estimator of a quantity.Similarly, if instead of the joint density
p
x;y( x ; y )
of the random vectorsx
andy
, we knowK
sample pairs( x
1; y
1) ; ( x
2; y
2) ;::: ; ( x
K; y
K)
, we can estimate the expectation (2.28) byEf
g ( x ; y )
g1 K
K
X
j=1
g ( x
j; y
j)
(2.35)For example, for the cross-correlation matrix, this yields the estimation formula
R ^
xy= 1 K
K
X
j=1
x
jy
Tj (2.36)Similar formulas are readily obtained for the other correlation type matr ices
R
xx,C
xx, andC
xy.2.3 UNCORRELATEDNESS AND INDEPENDENCE