• Nenhum resultado encontrado

STOCHASTIC PROCESSES * .1 Introduction and definition

No documento Independent Component Analysis (páginas 65-73)

MATHEMATICAL PRELIMINARIES

2.8 STOCHASTIC PROCESSES * .1 Introduction and definition

STOCHASTIC PROCESSES * 43

Thus higher-order cumulants measure the departure of a random vector fro m a gaus-sian random vector with an identical mean vector and covariance matrix. This property is highly useful, making it possible to use cumulants for extracting the nongaussian part of a signal. For example, they make it possible to ignore additive gaussian noise corrupting a nongaussian signal using cumulants.

Moments, cumulants, and characteristic functions have several other properties which are not discussed here. See, for example, the books [149, 319, 386] for more information. However, it is worth mentioning that both moments and cumulants have symmetry properties that can be exploited to reduce the computational load in estimating them [319].

For estimating moments and cumulants, one can apply the procedure introduced in Section 2.2.4. However, the fourth-order cumulants cannot be estimated directly, but one must first estimate the necessary moments as is obvious from (2.106). Practical estimation formulas can be found in [319, 315].

A drawback in utilizing order statistics is that reliable estimation of higher-order moments and cumulants requires much more samples than for second-higher-order statistics [318]. Another drawback is that higher-order statistics can be very sensitive to outliers in the data (see Section 8.3.1). For example, a few data samples having the highest absolute values may largely determine the value of kurtosis. Higher-order statistics can be taken into account in a more robust way by using the nonlinear hyperbolic tangent function

tanh( u )

, whose values always lie in the interval

(

?

1 ; 1)

,

or some other nonlinearity that grows slower than linearly with its argument value.

2.8 STOCHASTIC PROCESSES *

x (t)1

x (t)2

1 t t

n

sample Second function sample First

function

x (t) th

function sample n

Fig. 2.10 Sample functions of a stochastic process.

Figure 2.10 shows an example of a scalar stochastic process represented by the set of sample functionsf

x

j

( t )

g,

j = 1 ; 2 ;::: ;n

. Assume that the probability of occurrence of the

i

th sample function

x

i

( t )

is

P

i, and similarly for the other sample functions. Suppose then we observe the set of waveformsf

x

j

( t )

g,

j = 1 ; 2 ;::: ;n

,

simultaneously at some time instant

t = t

1, as shown in Figure 2.10. Clearly, the values f

x

j

( t

1

)

g,

j = 1 ; 2 ;::: ;n

of the

n

waveforms at time

t

1 form a discrete random variable with

n

possible values, each having the respective probability of occurrence

P

j. Consider then another time instant

t = t

2. We obtain again a random variablef

x

j

( t

2

)

g, which may have a different distribution thanf

x

j

( t

1

)

g.

Usually the number of possible waveforms arising from an experiment is infinitely large due to additive noise. At each time instant a continuous random variable having some distribution arises instead of the discrete one discussed above. However, the time instants

t

1

;t

2

;::: ;

on which the stochastic process is observed are discrete due to sampling. Usually the observation intervals are equispaced, and the resulting samples are represented using integer indices

x

j

(1)

=

x

j

( t

1

) ; x

j

(2)

=

x

j

( t

2

) ;:::

for

STOCHASTIC PROCESSES * 45

notational simplicity. As a result, a typical representation for a stochastic process consists of continuous random variables at discrete (integer) time instants.

2.8.2 Stationarity, mean, and autocorrelation function

Consider a stochastic processf

x

j

( t )

gdefined at discrete times

t

1

;t

2

;::: ;t

k. For characterizing the processf

x

j

( t )

gcompletely, we should know the joint probability density of all the random variablesf

x

j

( t

1

)

g,f

x

j

( t

2

)

g,

:::

,f

x

j

( t

k

)

g. The stochastic process is said to be stationary in the strict sense if its joint density is invariant under time shifts of origin. That is, the joint pdf of the process depends only on the differences

t

i?

t

jbetween the time instants

t

1

;t

2

;::: ;t

kbut not directly on them.

In practice, the joint probability density is not known, and its estimation from samples would be too tedious and require an excessive number of samples even if they were available. Therefore, stochastic processes are usually characterized in terms of their first two moments, namely the mean and autocorrelation or autocovariance functions. They give a coarse but useful description of the distribution. Using these statistics is sufficient for linear processing (for example filtering) of stochastic processes, and the number of samples needed for estimating them remains reasonable.

The mean function of the stochastic processf

x ( t )

gis defined

m

x

( t ) =

Ef

x ( t )

g

=

Z

1

?1

x ( t ) p

x(t)

( x ( t )) dx ( t )

(2.107)

Generally, this is a function of time

t

. However, when the processf

x ( t )

gis stationary, the probability density functions of all the random variables corresponding to different time instants become the same. This common pdf is denoted by

p

x

( x )

. In such a case, the mean function

m

x

( t )

reduces to a constant mean

m

xindependent of time.

Similarly, the variance function of the stochastic processf

x ( t )

g

2x

( t ) =

Ef

[ x ( t )

?

m

x

( t )]

2g

=

Z

1

?1

[ x ( t )

?

m

x

( t )]

2

p

x(t)

( x ( t )) dx ( t )

(2.108) becomes a time-invariant constant

2xfor a stationary process.

Other second-order statistics of a random processf

x ( t )

gare defined in a similar manner. In particular, the autocovariance function of the processf

x ( t )

gis given by

c

x

( t; ) =

cov

[ x ( t ) ;x ( t

?

)] =

Ef

[ x ( t )

?

m

x

( t )][ x ( t

?

)

?

m

x

( t

?

)]

g

(2.109) The expectation here is computed over the joint probability density of the random variables

x ( t )

and

x ( t

?

)

, where

is the constant time lag between the observation times

t

and

t

?

. For the zero lag

= 0

, the autocovariance reduces to the variance function (2.108). For stationary processes, the autocovariance function (2.109) is independent of the time

t

, but depends on the lag

:

c

x

( t; )

=

c

x

( )

.

Analogously, the autocorrelation function of the processf

x ( t )

gis defined by

r

x

( t; ) =

Ef

x ( t ) x ( t

?

)

g (2.110)

Iff

x ( t )

gis stationary, this again depends on the time lag

only:

r

x

( t; )

=

r

x

( )

.

Generally, if the mean function

m

x

( t )

of the process is zero, the autocovariance and autocorrelation functions become the same. If the lag

= 0

, the autocorrelation function reduces to the mean-square function

r

x

( t; 0)

= Ef

x

2

( t )

gof the process, which becomes a constant

r

x

(0)

for a stationary processf

x ( t )

g.

These concepts can be extended for two different stochastic processes f

x ( t )

g

andf

y ( t )

gin an obvious manner (cf. Section 2.2.3). More specifically, the cross-correlation function

r

xy

( t; )

and the cross-covariance function

c

xy

( t; )

of the

processesf

x ( t )

gandf

y ( t )

gare, respectively, defined by

r

xy

( t; ) =

Ef

x ( t ) y ( t

?

)

g (2.111)

c

xy

( t; ) =

Ef

[ x ( t )

?

m

x

( t )][ y ( t

?

)

?

m

y

( t

?

)]

g

(2.112) Several blind source separation methods are based on the use of cross-covariance functions (second-order temporal statistics). These methods will be discussed in Chapter 18.

2.8.3 Wide-sense stationary processes

A very important subclass of stochastic processes consists of wide-sense stationary (WSS) processes, which are required to satisfy the following properties:

1. The mean function

m

x

( t )

of the process is a constant

m

xfor all

t

.

2. The autocorrelation function is independent of a time shift: Ef

x ( t ) x ( t

?

)

g

=

r

x

( )

for all

t

.

3. The variance, or the mean-square value

r

x

(0)

= Ef

x

2

( t )

gof the process is finite.

The importance of wide-sense stationary stochastic processes stems from two facts.

First, they can often adequately describe the physical situation. Many practical stochastic processes are actually at least mildly nonstationary, meaning that their statistical properties vary slowly with time. However, such processes are usually on short time intervals roughly WSS. Second, it is relatively easy to develop useful mathematical algorithms for WSS processes. This in turn follows from limiting their characterization by first- and second-order statistics.

Example 2.8 Consider the stochastic process

x ( t ) = a cos( !t ) + b sin( !t )

(2.113)

where

a

and

b

are scalar random variables and

!

a constant parameter (angular frequency). The mean of the process

x ( t )

is

m

x

( t ) =

Ef

x ( t )

g

=

Ef

a

g

cos( !t ) +

Ef

b

g

sin( !t )

(2.114)

STOCHASTIC PROCESSES * 47

and its autocorrelation function can be written

r

x

( t; ) =

Ef

x ( t ) x ( t

?

)

g

= 12

Ef

a

2g

[cos( ! (2 t

?

)) + cos(

?

! )]

+ 12

Ef

b

2g

[

?

cos( ! (2 t

?

)) + cos(

?

! )]

+

Ef

ab

g

[sin( ! (2 t

?

)]

(2.115)

where we have used well-known trigonometric identities. Clearly, the process

x ( t )

is generally nonstationary, since both its mean and autocorrelation functions depend on the time

t

.

However, if the random variables

a

and

b

are zero mean and uncorrelated with equal variances, so that

Ef

a

g

=

Ef

b

g

=

Ef

ab

g

= 0

Ef

a

2g

=

Ef

b

2g

the mean (2.114) of the process becomes zero, and its autocorrelation function (2.115) simplifies to

r

x

( ) =

Ef

a

2g

cos( ! )

which depends only on the time lag

. Hence, the process is WSS in this special case (assuming that Ef

a

2gis finite).

Assume now thatf

x ( t )

gis a zero-mean WSS process. If necessary, the process can easily be made zero mean by first subtracting its mean

m

x. It is sufficient to consider the autocorrelation function

r

x

( )

off

x ( t )

gonly, since the autocovariance function

c

x

( )

coincides with it. The autocorrelation function has certain properties that are worth noting. First, it is an even function of the time lag

:

r

x

(

?

) = r

x

( )

(2.116)

Another property is that the autocorrelation function achieves its maximum absolute value for zero lag:

?

r

x

(0)

r

x

( )

r

x

(0)

(2.117)

The autocorrelation function

r

x

( )

measures the correlation of random variables

x ( t )

and

x ( t

?

)

that are

units apart in time, and thus provides a simple measure for the dependence of these variables which is independent of the time

t

due to the

WSS property. Roughly speaking, the faster the stochastic process fluctuates with time around its mean, the more rapidly the values of the autocorrelation function

r

x

( )

decrease from their maximum

r

x

(0)

as

increases.

Using the integer notation for the samples

x ( i )

of the stochastic process, we can represent the last

m +1

samples of the stochastic process at time

n

using the random vector

x ( n ) = [ x ( n ) ;x ( n

?

1) ;::: ;x ( n

?

m )]

T (2.118)

Assuming that the values of the autocorrelation function

r

x

(0) ;r

x

(1) ;::: ;r

x

( m )

are

known up to a lag of

m

samples, the

( m + 1)

( m + 1)

correlation (or covariance) matrix of the processf

x ( n )

gis defined by

R

x

=

2

6

6

6

4

r

x

(0) r

x

(1) r

x

(2)

r

x

( m ) r

x

(1) r

x

(0) r

x

(1)

r

x

( m

?

1)

... ... ... . .. ...

r

x

( m ) r

x

( m

?

1) r

x

( m

?

2)

r

x

(0)

3

7

7

7

5 (2.119) The matrix

R

x satisfies all the properties of correlation matrices listed in Section 2.2.2. Furthermore, it is a Toeplitz matrix. This is generally defined so that on each subdiagonal and on the diagonal, all the elements of Toeplitz matrix are the same.

The Toeplitz property is helpful, for example, in solving linear equations, enabling use of faster algorithms than for more general matrices.

Higher-order statistics of a stationary stochastic process

x ( n )

can be defined in an analogous manner. In particular, the cumulants of

x ( n )

have the form [315]

cumxx

( j ) =

Ef

x ( i ) x ( i + j )

g

cumxxx

( j;k ) =

Ef

x ( i ) x ( i + j ) x ( i + k )

g (2.120)

cumxxx

( j;k;l ) =

Ef

x ( i ) x ( i + j ) x ( i + k ) x ( i + l )

g

?Ef

x ( i ) x ( j )

gEf

x ( k ) x ( l )

g?Ef

x ( i ) x ( k )

gEf

x ( j ) x ( l )

g

?Ef

x ( i ) x ( l )

gEf

x ( j ) x ( k )

g

These definitions correspond to the formulas (2.106) given earlier for a general random vector

x

. Again, the second and third cumulant are the same as the respective moments, but the fourth cumulant differs from the fourth moment Ef

x ( i ) x ( i + j ) x ( i + k ) x ( i + l )

g. The second cumulant cumxx

( j )

is equal to the autocorrelation

r

x

( j )

and autocovariance

c

x

( j )

.

2.8.4 Time averages and ergodicity

In defining the concept of a stochastic process, we noted that at each fixed time instant

t = t

0the possible values

x ( t

0

)

of the process constitute a random variable having some probability distribution. An important practical problem is that these distributions (which are different at different times if the process is nonstationary) are not known, at least not exactly. In fact, often all that we have is just one sample of the process corresponding to each discrete time index (since time cannot be stopped to acquire more samples). Such a sample sequence is called a realization of the stochastic process. In handling WSS processes, we need to know in most cases only the mean and autocorrelation values of the process, but even they are often unknown.

A practical way to circumvent this difficulty is to replace the usual expectations of the random variables, called ensemble averages, by long-term sample averages or time averages computed from the available single realization. Assume that this realization contains

K

samples

x (1) ;x (2) ;::: ;x ( K )

. Applying the preceding principle, the

STOCHASTIC PROCESSES * 49

mean of the process can be estimated using its time average

m ^

x

( K ) = 1 K

K

X

k=1

x ( k )

(2.121)

and the autocorrelation function for the lag value

l

using

r ^

x

( l;K ) = 1 K

?

l

KX?l

k=1

x ( k + l ) x ( k )

(2.122)

The accuracy of these estimates depends on the number

K

of samples. Note also that the latter estimate is computed over the

K

?

l

possible sample pairs having the lag

l

that can be found from the sample set. The estimates (2.122) are unbiased, but if the number of pairs

K

?

l

available for estimation is small, their variance can be high. Therefore, the scaling factor

K

?

l

of the sum in (2.122) is often replaced by

K

in order to reduce the variance of the estimated autocorrelation values

r ^

x

( l;K )

,

even though the estimates then become biased [169]. As

K

! 1, both estimates tend toward the same value.

The stochastic process is called ergodic if the ensemble averages can be equated to the respective time averages. Roughly speaking, a random process is ergodic with respect to its mean and autocorrelation function if it is stationary. A more rigorous treatment of the topic can be found for example in [169, 353, 141].

For mildly nonstationary processes, one can apply the estimation formulas (2.121) and (2.122) by computing the time averages over a shorter time interval during which the process can be regarded to be roughly WSS. It is important to keep this in mind. Sometimes formula (2.122) is applied in estimating the autocorrelation values without taking into account the stationarity of the process. The consequences can be drastic, for example, rendering eigenvectors of the correlation matrix ( 2.119) useless for practical purposes if ergodicity of the process is in reality a grossly invalid assumption.

2.8.5 Power spectrum

A lot of insight into a WSS stochastic process is often gained by representing it in the frequency domain. The power spectrum or spectral density of the process

x ( n )

provides such a representation. It is defined as the discrete Fourier transform of the autocorrelation sequence

r

x

(0) ;r

x

(1) ;:::

:

S

x

( ! ) =

X1

k=?1

r

x

( k )exp(

?

|k! )

(2.123)

where

|

= p?

1

is the imaginary unit and

!

the angular frequency. The time domain representation given by the autocorrelation sequence of the process can be obtained from the power spectrum

S

x

( ! )

by applying the inverse discrete-time

Fourier transform

r

x

( k ) = 12

Z

?

S

x

( ! )exp( |k! ) d!; k = 1 ; 2 ;:::

(2.124) It is easy to see that the power spectrum (2.123) is always real-valued, even, and a periodic function of the angular frequency

!

. Note also that the power spectrum is a continuous function of

!

, while the autocorrelation sequence is discrete. In practice, the power spectrum must be estimated from a finite number of autocorrelation values.

If the autocorrelation values

r

x

( k )

!

0

sufficiently quickly as the lag

k

grows large, this provides an adequate approximation.

The power spectrum describes the frequency contents of the stochastic process, showing which frequencies are present in the process and how much power they possess. For a sinusoidal signal, the power spectrum shows a sharp peak at its oscillating frequency. Various methods for estimating power spectra are discussed thoroughly in the books [294, 241, 411].

Higher-order spectra can be defined in a similar manner to the power spectrum as Fourier transforms of higher-order statistics [319, 318]. Contrary to the power spectra, they retain information about the phase of signals, and have found many applications in describing nongaussian, nonlinear, and nonminimum-phase signals [318, 319, 315].

2.8.6 Stochastic signal models

A stochastic process whose power spectrum is constant for all frequencies

!

is called

white noise. Alternatively, white noise

v ( n )

can be defined as a process for which any two different samples are uncorrelated:

r

v

( k ) =

Ef

v ( n ) v ( n

?

k )

g

=

(

2v

; k = 0

0 ; k =

1 ;

2 ;:::

(2.125)

Here

2vis the variance of the white noise. It is easy to see that the power spectrum of the white noise is

S

v

( ! )

=

2v for all

!

, and that the formula (2.125) follows from the inverse transform (2.124). The distribution of the random variable

v ( n )

forming the white noise can be any reasonable one, provided that the samples are uncorrelated at different time indices. Usually this distribution is assumed to be gaussian. The reason is that white gaussian noise is maximally random because any two uncorrelated samples are also independent. Furthermore, such a noise process cannot be modeled to yield an even simpler random process.

Stochastic processes or time series are frequently modeled in terms of autoregres-sive (AR) processes. They are defined by the difference equation

x ( n ) =

?XM

i=1

a

i

x ( n

?

i ) + v ( n )

(2.126)

CONCLUDING REMARKS AND REFERENCES 51

where

v ( n )

is a white noise process, and

a

1

;::: ;a

M are constant coefficients (pa-rameters) of the AR model. The model order

M

gives the number of previous samples on which the current value

x ( n )

of the AR process depends. The noise term

v ( n )

introduces randomness into the model; without it the AR model would be completely deterministic. The coefficients

a

1

;::: ;a

M of the AR model can be computed us-ing linear techniques from autocorrelation values estimated from the available data [419, 241, 169]. Since the AR models describe fairly well many natural stochastic processes, for example, speech signals, they are used in many applications. In ICA and BSS, they can be used to model the time correlations in each source process

s

i

( t )

. This sometimes improves greatly the performance of the algorithms.

Autoregressive processes are a special case of autoregressive moving average (ARMA) processes described by the difference equation

x ( n ) +

XM

i=1

a

i

x ( n

?

i ) = v ( n ) +

XN

i=1

b

i

v ( n

?

i )

(2.127)

Clearly, the AR model (2.126) is obtained from the ARMA model (2.127) when the moving average (MA) coefficients

b

1

;::: ;b

N are all zero. On the other hand, if the AR coefficients

a

i are all zero, the ARMA process (2.127) reduces to a MA process of order

N

. The ARMA and MA models can also be used to describe stochastic processes. However, they are applied less frequently, because estimation of their parameters requires nonlinear techniques [241, 419, 411]. See the Appendix of Chapter 19 for a discussion of the stability of the ARMA model and its utilization in digital filtering.

No documento Independent Component Analysis (páginas 65-73)