Large covariance matrices I

(1)

Spiked Models in Large Random Matrices and two statistical applications

Jamal Najim najim@univ-mlv.fr

CNRS & Universit´e Paris Est

Linstat- Link¨oping, Sweden - August 2014

1

(2)

Introduction

Large Random Matrices Objectives

Basic technical means Large covariance matrices Spiked models

Statistical Test for Single-Source Detection Direction of Arrival Estimation

Conclusion

(3)

Large covariance matrices I

The model

I Consider aN×nmatrixXN with i.i.d. entries EXij= 0, E|Xij|²= 1.

I LetR_N be adeterministicN×N nonnegative definite hermitian matrix.

I Consider

YN=R^1/2_N XN.

MatrixYN is an-sample ofN-dimensional vectors:

YN= [Y·1 · · · Y·n] with Y·1=R^1/2_N X·1 and EY·1Y^∗·1=RN.

I RN often calledPopulation covariance matrix.

Objective

To understand the spectrum of ¹_nYNY^∗_N as

N, n→ ∞ ⇔ N

n →c∈(0,∞)

3

(4)

Large covariance matrices I

The model

I Consider

YN=R^1/2_N XN. MatrixYN is an-sample ofN-dimensional vectors:

YN= [Y·1 · · · Y·n] with Y·1=R^1/2_N X·1 and EY·1Y^∗·1=RN .

I RNoften calledPopulation covariance matrix.

Objective

N, n→ ∞ ⇔ N

n →c∈(0,∞)

(5)

Large covariance matrices I

The model

I Consider

YN=R^1/2_N XN. MatrixYN is an-sample ofN-dimensional vectors:

YN= [Y·1 · · · Y·n] with Y·1=R^1/2_N X·1 and EY·1Y^∗·1=RN .

I RNoften calledPopulation covariance matrix.

Objective

N, n→ ∞ ⇔ N

n →c∈(0,∞)

3

(6)

Large covariance matrices II

Remarks

1. The asymptotic regimeN, n→ ∞and ^N_n →c∈(0,∞)corresponds to the case where the data dimensionis of the same orderas the number of available samples.

2. IfNfixed andn→ ∞(small data, large samples) then 1

nYNY^∗_N−→RN

The spectral measure of a matrix A

.. also called theempirical measure of the eigenvalues

IfAisN×N hermitian with eigenvaluesλ1,· · ·, λN then itsspectral measureis:

LN= 1 N

N

X

i=1

δ_λ_i_(A) ⇒ LN([a, b]) = #{λi(A)∈[a, b]} N Otherwise stated

LN([a, b])is theproportionof eigenvalues ofAin[a, b].

(7)

Large covariance matrices II

Remarks

nYNY^∗_N−→RN

The spectral measure of a matrix A

LN= 1 N

N

X

i=1

δ_λ_i_(A) ⇒ LN([a, b]) = #{λi(A)∈[a, b]} N

Otherwise stated

4

(8)

Large covariance matrices II

Remarks

nY_NY^∗_N−→R_N

The spectral measure of a matrix A

LN= 1 N

N

X

i=1

δ_λ_i_(A) ⇒ LN([a, b]) = #{λi(A)∈[a, b]} N Otherwise stated

(9)

Large covariance matrices II

Remarks

The spectral measure of a matrix A

LN= 1 N

N

X

i=1

δ_λ_i_(A)

⇒ LN([a, b]) = #{λi(A)∈[a, b]} N

Otherwise stated

4

(10)

Large covariance matrices II

Remarks

The spectral measure of a matrix A

LN= 1 N

N

X

i=1

δ_λ_i_(A) ⇒ LN([a, b]) = #{λi(A)∈[a, b]}

N

Otherwise stated

(11)

Large covariance matrices II

Remarks

The spectral measure of a matrix A

LN= 1 N

N

X

i=1

δ_λ_i_(A) ⇒ LN([a, b]) = #{λi(A)∈[a, b]}

N

Otherwise stated

4

(12)

Introduction

Basic technical means Large covariance matrices Spiked models

Conclusion

(13)

Objectives of this talk

1. to describe the limiting spectral properties of the large covariance matrix 1

nYnY_n^∗=1

nR^1/2n XnX^∗_nR^1/2n

2. to study a particular class of covariance matrix models: spiked models, for which one or several eigenvalues are clearly separated from the mass of the other eigenvalues.

3. to present two applications of these results instatistical signal processing: signal detection and direction of arrival estimation.

6

(14)

Objectives of this talk

nYnY_n^∗=1

(15)

Objectives of this talk

nYnY_n^∗=1

6

(16)

Introduction

Basic technical means

Large covariance matrices Spiked models

Conclusion

(17)

Spectrum and eigenvectors analysis

The resolvent

I The resolvent ofAis Q(z) = (A−zI)⁻¹

I its singularities are exactlyeigenvaluesofA.

I Problem: if size ofAbig, then size ofQbigas well.

The normalized trace of the resolvent

I Function

gn(z) = 1

NTrace (A−zI)⁻¹ provides information on thespectrumofA.

I It is theStieltjes transformof the spectral measure ofA(cf. supra)

8

(18)

Spectrum and eigenvectors analysis

The resolvent

The normalized trace of the resolvent

I Function

gn(z) = 1

(19)

Spectrum and eigenvectors analysis

The resolvent

The normalized trace of the resolvent

I Function

gn(z) = 1

8

(20)

Spectrum and eigenvectors analysis

The resolvent

The normalized trace of the resolvent

I Function

gn(z) = 1

(21)

Spectrum analysis: The Stieltjes Transform

Given a probabilityP, itsStieltjes transformis defined by g(z) =

Z

R

P(dλ)

λ−z , z∈C⁺,

with inverse formula Z

f dP = 1 πlim

y↓0= Z

R

f(x)g(x+iy)dx ,

forf bounded continuous.

Properties

1. Convergence in distribution is characterized by pointwise convergence of Stieltjes transforms:

Pn

−−−−→D

n→∞ P ⇔ ∀z∈C⁺, gn(z) =

Z Pn(dλ)

λ−z −−−−→

n→∞ g(z) =

Z P(dλ) λ−z 2. Spectral measure:

P= 1 N

N

X

i=1

δλ_i(A) ⇒ gn(z) = 1 N

N

X

i=1

1

λi(A)−z = 1

NTrace (A−zI)⁻¹

The Stieltjes transfomgnis thenormalized traceof the resolvent(A−zI)⁻¹

9

(22)

Spectrum analysis: The Stieltjes Transform

Z

R

P(dλ)

λ−z , z∈C⁺, with inverse formula

Z

f dP = 1 πlim

y↓0= Z

R

f(x)g(x+iy)dx ,

Properties

Pn

−−−−→D

n→∞ P ⇔ ∀z∈C⁺, gn(z) =

Z Pn(dλ)

λ−z −−−−→

n→∞ g(z) =

P= 1 N

N

X

i=1

δλ_i(A) ⇒ gn(z) = 1 N

N

X

i=1

1

λi(A)−z = 1

(23)

Spectrum analysis: The Stieltjes Transform

Z

R

P(dλ)

Z

f dP = 1 πlim

y↓0= Z

R

f(x)g(x+iy)dx ,

Properties

Pn

−−−−→D

n→∞ P ⇔ ∀z∈C⁺, gn(z) =

Z Pn(dλ)

λ−z −−−−→

n→∞ g(z) =

Z P(dλ) λ−z

2. Spectral measure:

P= 1 N

N

X

i=1

δλ_i(A) ⇒ gn(z) = 1 N

N

X

i=1

1

λi(A)−z = 1

9

(24)

Spectrum analysis: The Stieltjes Transform

Z

R

P(dλ)

Z

f dP = 1 πlim

y↓0= Z

R

f(x)g(x+iy)dx ,

Properties

Pn

−−−−→D

n→∞ P ⇔ ∀z∈C⁺, gn(z) =

Z Pn(dλ)

λ−z −−−−→

n→∞ g(z) =

P= 1 N

N

X

i=1

δλ_i(A) ⇒ gn(z) = 1 N

N

X

i=1

1

λi(A)−z = 1

(25)

Spectrum analysis: The Stieltjes Transform

Z

R

P(dλ)

Z

f dP = 1 πlim

y↓0= Z

R

f(x)g(x+iy)dx ,

Properties

Pn

−−−−→D

n→∞ P ⇔ ∀z∈C⁺, gn(z) =

Z Pn(dλ)

λ−z −−−−→

n→∞ g(z) =

P= 1 N

N

X

i=1

δλ_i(A) ⇒ gn(z) = 1 N

N

X

i=1

1

λi(A)−z = 1

9

(26)

Introduction

Large covariance matrices

Wishart matrices and Marˇcenko-Pastur’s theorem The general covariance model

Spiked models

Conclusion

(27)

Wishart Matrices

The model

I We first focus on covariance matrices in the case where RN=σ²IN .

I HenceYNis aN×nmatrix with i.i.d. entriesEYij= 0, E|Y_ij|²=σ².

I Matrix _n¹YNY_N^∗ is aWishart matrix.

The standard case N << n

AssumeN fixed andn→ ∞(small data, large sample). Since EY·1Y^∗·1=σ²IN ,

L.L.N implies 1

nYNY_N^∗ = 1 n

n

X

i=1

Y·iY^∗_·i −−−−→^a.s.

n→∞ σ²IN

In particular,

I all the eigenvalues of _n¹YNY_N^∗ converge toσ²,

I equivalently, the spectral measure of _n¹YNY_N^∗ converges toδ_σ2.

11

(28)

Wishart Matrices

The model

The standard case N << n

L.L.N implies 1

nYNY_N^∗ = 1 n

n

X

i=1

Y·iY^∗_·i −−−−→^a.s.

n→∞ σ²IN

In particular,

(29)

Wishart Matrices

The model

The standard case N << n

L.L.N implies 1

nYNY_N^∗ = 1 n

n

X

i=1

Y·iY^∗_·i −−−−→^a.s.

n→∞ σ²IN

In particular,

11

(30)

Wishart Matrices

The model

The standard case N << n

AssumeN fixed andn→ ∞(small data, large sample).

Since EY·1Y^∗·1=σ²IN ,

L.L.N implies 1

nYNY_N^∗ = 1 n

n

X

i=1

Y·iY^∗_·i −−−−→^a.s.

n→∞ σ²IN

In particular,

(31)

Wishart Matrices

The model

The standard case N << n

AssumeN fixed andn→ ∞(small data, large sample). Since EY·1Y^∗·1=σ²IN,

L.L.N implies 1

nYNY_N^∗ = 1 n

n

X

i=1

Y·iY^∗_·i −−−−→^a.s.

n→∞ σ²IN

In particular,

11

(32)

Wishart Matrices

The model

The standard case N << n

L.L.N implies 1

nYNY_N^∗ = 1 n

n

X

i=1

Y·iY^∗_·i −−−−→^a.s.

n→∞ σ²IN

In particular,

(33)

Wishart Matrices

The model

The standard case N << n

L.L.N implies 1

nYNY_N^∗ = 1 n

n

X

i=1

Y·iY^∗_·i −−−−→^a.s.

n→∞ σ²IN

In particular,

11

(34)

Marˇ cenko-Pastur theorem

Theorem

I Consider thespectral measureLN:

LN= 1 N

N

X

i=1

δλ_i, λi=λi

1 nYNY^∗_N

I Thenalmost surely(= for almost every realization) LN−−−−−−→

N,n→∞ PMPˇ in distribution as N n −−−−→

n→∞ c∈(0,∞) wherePMPˇ isMarˇcenko-Pasturdistribution:

PMPˇ (dx) =

1−1 c

+

δ0(dx) +

p(b−x)(x−a)

2πσ²xc 1[a,b](x)dx

with

a = σ²(1−√ c)² b = σ²(1 +√

c)²

(35)

Marˇ cenko-Pastur theorem

Theorem

LN= 1 N

N

X

i=1

δλ_i, λi=λi

1 nYNY^∗_N

PMPˇ (dx) =

1−1 c

+

δ0(dx) +

p(b−x)(x−a)

with

a = σ²(1−√ c)² b = σ²(1 +√

c)²

12

(36)

Marˇ cenko-Pastur theorem

Theorem

LN= 1 N

N

X

i=1

δλ_i, λi=λi

1 nYNY^∗_N

n→∞ c∈(0,∞)

wherePMPˇ isMarˇcenko-Pasturdistribution:

PMPˇ (dx) =

1−1 c

+

δ0(dx) +

p(b−x)(x−a)

with

a = σ²(1−√ c)² b = σ²(1 +√

c)²

(37)

Marˇ cenko-Pastur theorem

Theorem

LN= 1 N

N

X

i=1

δλ_i, λi=λi

1 nYNY^∗_N

PMPˇ (dx) =

1−1 c

+

δ0(dx) +

p(b−x)(x−a)

with

a = σ²(1−√ c)² b = σ²(1 +√

c)²

12

(38)

Marˇ cenko-Pastur theorem

Theorem

LN= 1 N

N

X

i=1

δλ_i, λi=λi

1 nYNY^∗_N

PMPˇ (dx) =

1−1 c

+

δ0(dx) +

p(b−x)(x−a)

with

a = σ²(1−√ c)² b = σ²(1 +√

c)²

(39)

Histogram for Wishart matrices

: Marˇ cenko-Pastur’s theorem

Matrix model: Wishart matrix

Consider the spectrum of ¹_nYNY^∗_Nin the regime where

N, n→ ∞ and N

n →c∈(0,∞) Plot thehistogram of its eigenvalues.

Marˇ cenko-Pastur’s theorem (1967)

”The histogram of aLarge Covariance Matrixconverges to Marˇcenko-Pastur distributionwith given parameter (here0.7)”

13

(40)

Histogram for Wishart matrices

: Marˇ cenko-Pastur’s theorem

Matrix model: Wishart matrix

N, n→ ∞ and N

Wishart Matrix, N= 4 ,n= 10

spectrum

Density

0.5 1.0 1.5 2.0 2.5 3.0

0.00.20.40.60.81.01.2

Figure :Spectrum’s histogram -^N_n = 0.7

Marˇ cenko-Pastur’s theorem (1967)

(41)

Histogram for Wishart matrices

: Marˇ cenko-Pastur’s theorem

Matrix model: Wishart matrix

N, n→ ∞ and N

spectrum

Density

0.0 0.5 1.0 1.5 2.0 2.5

0.00.20.40.60.8

Marˇ cenko-Pastur’s theorem (1967)

13

(42)

Histogram for Wishart matrices

: Marˇ cenko-Pastur’s theorem

Matrix model: Wishart matrix

N, n→ ∞ and N

spectrum

Density

0.0 0.5 1.0 1.5 2.0 2.5

0.00.20.40.60.8

Marˇ cenko-Pastur’s theorem (1967)

(43)

Histogram for Wishart matrices

: Marˇ cenko-Pastur’s theorem

Matrix model: Wishart matrix

N, n→ ∞ and N

spectrum

Density

0.0 0.5 1.0 1.5 2.0 2.5

0.00.20.40.60.8

Marˇ cenko-Pastur’s theorem (1967)

13

(44)

Histogram for Wishart matrices

: Marˇ cenko-Pastur’s theorem

Matrix model: Wishart matrix

N, n→ ∞ and N

spectrum

Density

0.0 0.5 1.0 1.5 2.0 2.5

0.00.20.40.60.8

Marˇ cenko-Pastur’s theorem (1967)

(45)

Histogram for Wishart matrices: Marˇ cenko-Pastur’s theorem

Matrix model: Wishart matrix

N, n→ ∞ and N

spectrum

Density

0.0 0.5 1.0 1.5 2.0 2.5

0.00.20.40.60.8

Figure :Marˇcenko-Pastur’s distribution(in red)

Marˇ cenko-Pastur’s theorem (1967)

13

(46)

Elements of proof

1. Convergence of the Stieltjes transform.

Since

LN= 1 N

N

X

i=1

δλ_i−−−−−−→

N,n→∞ PMP^ˇ ⇐⇒ gn(z)−−−−−−→

N,n→∞ ST PMP^ˇ

we prove the convergence ofgn.

2. After algebraic manipulations and probabilistic arguments, we prove that

gn(z)≈ 1

σ²(1−cn)−z−zσ²cngn(z)

3. Necessarily,

gn−−−−−−→

N,n→∞ gMPˇ

which satisfiesthe fixed point equation:

gMPˇ (z) = 1

σ²(1−c)−z−zσ²cgMPˇ (z)

4. Solving explicitely the previous equation, we identify

PMP^ˇ = (Stieltjes Transform)⁻¹(gMPˇ ).

(47)

Elements of proof

1. Convergence of the Stieltjes transform. Since

LN= 1 N

N

X

i=1

δλ_i−−−−−−→

N,n→∞ PMPˇ ⇐⇒ gn(z)−−−−−−→

N,n→∞ ST PMPˇ

gn(z)≈ 1

3. Necessarily,

gn−−−−−−→

N,n→∞ gMPˇ

gMPˇ (z) = 1

14

(48)

Elements of proof

LN= 1 N

N

X

i=1

δλ_i−−−−−−→

N,n→∞ PMPˇ ⇐⇒ gn(z)−−−−−−→

N,n→∞ ST PMPˇ

gn(z)≈ 1

3. Necessarily,

gn−−−−−−→

N,n→∞ gMPˇ

gMPˇ (z) = 1

(49)

Elements of proof

LN= 1 N

N

X

i=1

δλ_i−−−−−−→

N,n→∞ PMPˇ ⇐⇒ gn(z)−−−−−−→

N,n→∞ ST PMPˇ

gn(z)≈ 1

3. Necessarily,

gn−−−−−−→

N,n→∞ gMPˇ

gMPˇ (z) = 1

14

(50)

Elements of proof

LN= 1 N

N

X

i=1

δλ_i−−−−−−→

N,n→∞ PMPˇ ⇐⇒ gn(z)−−−−−−→

N,n→∞ ST PMPˇ

gn(z)≈ 1

3. Necessarily,

gn−−−−−−→

N,n→∞ gMPˇ

gMPˇ (z) = 1

(51)

Introduction

Large covariance matrices

Wishart matrices and Marˇcenko-Pastur’s theorem The general covariance model

Spiked models

Conclusion

15

(52)

Theorem

Recall the notations

Yn=R^1/2_N XN and gn(z) = 1 NTrace

1

nYNY^∗_N−zIN

−1

We are interested in the limiting behaviour of

L_N= 1 N

N

X

i=1

δ_λ_i with λi=λi

1 nY_NY^∗_N

Canonical equation

I UnknowntN is a Stieltjes transform, solution of

tN(z) = 1

NTrace [(1−cn)RN−zIN−zcntN(z)RN]⁻¹

I Consider associated probabilityPNdefined by

PN = (Stieltjes transform)⁻¹(tN) i.e. tN(z) =

Z PN(d λ) λ−z

Convergence

I ThentN andPNare thedeterminitic equivalentsofgnandLN: gN(z)−tN(z) −−−−−−→^a.s.

N,n→∞ 0 and 1 N

N

X

i=1

f(λi)− Z

f(λ)PN(d λ) −−−−−−→^a.s.

N,n→∞ 0,

(53)

Theorem

1

nYNY^∗_N−zIN

−1

L_N= 1 N

N

X

i=1

1 nY_NY^∗_N

Canonical equation

tN(z) = 1

Z PN(d λ) λ−z

Convergence

I ThentN andPNare thedeterminitic equivalentsofgnandLN: gN(z)−tN(z) −−−−−−→^a.s.

N,n→∞ 0 and 1 N

N

X

i=1

f(λi)− Z

f(λ)PN(d λ) −−−−−−→^a.s.

N,n→∞ 0,

16

(54)

Theorem

1

nYNY^∗_N−zIN

−1

L_N= 1 N

N

X

i=1

1 nY_NY^∗_N

Canonical equation

tN(z) = 1

Z PN(d λ) λ−z

Convergence

I Thent_N andPNare thedeterminitic equivalentsofgnandL_N:

gN(z)−tN(z) −−−−−−→^a.s.

N,n→∞ 0 and 1 N

N

X

i=1

f(λi)− Z

f(λ)PN(d λ) −−−−−−→^a.s.

N,n→∞ 0,

(55)

Theorem

1

nYNY^∗_N−zIN

−1

L_N= 1 N

N

X

i=1

1 nY_NY^∗_N

Canonical equation

tN(z) = 1

Z PN(d λ) λ−z

Convergence

I Thent_N andPNare thedeterminitic equivalentsofgnandL_N: gN(z)−tN(z) −−−−−−→^a.s.

N,n→∞ 0

and 1 N

N

X

i=1

f(λi)− Z

f(λ)PN(d λ) −−−−−−→^a.s.

N,n→∞ 0,

16

(56)

Theorem

1

nYNY^∗_N−zIN

−1

L_N= 1 N

N

X

i=1

1 nY_NY^∗_N

Canonical equation

tN(z) = 1

Z PN(d λ) λ−z

Convergence

I Thent_N andPNare thedeterminitic equivalentsofgnandL_N: gN(z)−tN(z) −−−−−−→^a.s.

N,n→∞ 0 and 1 N

N

X

i=1

f(λi)− Z

f(λ)PN(d λ) −−−−−−→^a.s.

N,n→∞ 0,

(57)

Remark

Assume moreover that 1 N

N

X

i=1

δ_λ_i_(R_N₎−−−−−−→

N,n→∞ P^R

Then instead of having a series ofcanonical equationsdepending onN:

tN(z) = 1

we can obtain a ”limiting equation”

t(z) =

Z P^R(d λ)

(1−c)λ−z−zct(z)λ where t(z) =

Z P^∞(d λ) λ−z

and genuine limits

gN(z) −−−−−−→^a.s.

N,n→∞ t(z), 1

N

X

i=1

f(λi) −−−−−−→^a.s.

N,n→∞

Z

f(λ)P^∞(d λ),

where theλi’s are the eigenvalues of _n¹YNY_N^∗

17

(58)

Remark

N

X

i=1

δ_λ_i_(R_N₎−−−−−−→

N,n→∞ P^R

tN(z) = 1

t(z) =

Z P^R(d λ)

and genuine limits

gN(z) −−−−−−→^a.s.

N,n→∞ t(z), 1

N

X

i=1

f(λi) −−−−−−→^a.s.

N,n→∞

Z

f(λ)P^∞(d λ),

(59)

Remark

N

X

i=1

δ_λ_i_(R_N₎−−−−−−→

N,n→∞ P^R

tN(z) = 1

t(z) =

Z P^R(d λ)

and genuine limits

gN(z) −−−−−−→^a.s.

N,n→∞ t(z), 1

N

X

i=1

f(λi) −−−−−−→^a.s.

N,n→∞

Z

f(λ)P^∞(d λ),

17

(60)

Remark

N

X

i=1

δ_λ_i_(R_N₎−−−−−−→

N,n→∞ P^R

tN(z) = 1

t(z) =

Z P^R(d λ)

and genuine limits

gN(z) −−−−−−→^a.s.

N,n→∞ t(z),

1 N

N

X

i=1

f(λi) −−−−−−→^a.s.

N,n→∞

Z

f(λ)P^∞(d λ),

(61)

Remark

N

X

i=1

δ_λ_i_(R_N₎−−−−−−→

N,n→∞ P^R

tN(z) = 1

t(z) =

Z P^R(d λ)

and genuine limits

gN(z) −−−−−−→^a.s.

N,n→∞ t(z), 1

N

X

i=1

f(λi) −−−−−−→^a.s.

N,n→∞

Z

f(λ)P^∞(d λ),

17

(62)

Simulations

I Consider the distribution P^R=1

3δ1+1 3δ3+1

3δ7

corresponding to a covariance matrix RN= diag(1,3,7)

each with multiplicity≈^N₃.

I We plot hereafter the limiting spectral distribution

P^∞ for different values ofc.

t(z) =1 3

( 1

(1−c)λ1−z−zct(z)λ1

+ 1

(1−c)λ2−z−zct(z)λ2

+ 1

(1−c)λ3−z−zct(z)λ3 )