• Nenhum resultado encontrado

The covariance

N/A
N/A
Protected

Academic year: 2023

Share "The covariance"

Copied!
72
0
0

Texto

(1)

Lecture 16: PCA, ICA

Andreas Wichert

Department of Computer Science and Engineering Técnico Lisboa

(2)

The Karhunen-Loève Transform

• The Karhunen-Loève transform is a linear transform that maps possibly correlated variables into a set of values of linearly

uncorrelated variables

• This transformation is defined in such a way that the first principal component has the largest possible variance

(3)

The covariance

• The sample size denoted by n, is the number of data items in a sample of a population.

• The goal is to make inferences about a population from a sample.

• Sample covariance indicates the relationship between two variables of a sample

The sample covariance has n − 1 in the denominator rather than n due to Bessel’s correction.

• For the whole population, the covariance is

(4)

• The sample covariance relies on the difference between each observation and the sample mean.

• In computer science, the sample covariance is usually used and

• In statistics (Bishop) population covariance

• In a linear relationship, either the high values of one variable are paired with the high values of another variable or the high values of one variable are paired with the low values of another variable

(5)

• For example, for a list of two variables, (X, Y ),

• represents the data set Σ. The sample covariance of the data set is 0.82456. Ordering the list by X, we notice that the ascending X values are matched by ascending Y values

(6)

• The covariance matrix measures the tendency of two features, xi and xj , to vary in the same direction. The covariance between features xi and xj is estimated for n vectors as

• with xi and yj being the arithmetic mean of the two variables of the sample. Covariances are symmetric; c ij = cji

cij =

xi(k) mi

( ) (

x j(k) mj

)

k=1 n

n 1

(7)

Correlation

• Covariance is related to correlation

rij =

xi(k)mi

( ) (

x j(k) m j

)

k=1 n

(n −1)sisj = cij

sisj ∈ −

[

1,1

]

(8)

• The resulting covariance matrix C is symmetric and positive-definite,

(9)
(10)

(a) The data points of the set Σ (b) The two dimensional distribution of Σ can be described by three ellipse that divide the data points in four equal groups.

(11)

The Karhunen-Loève Transform

(12)
(13)
(14)

• The squares of the eigenvalues represent the variances along the

eigenvectors. The eigenvalues corresponding to the covariance matrix of the data set Σ are

(15)
(16)

it rotates the coordinate system in such a way that the new covariance matrix will be diagonal

(17)

It rotates the system (the points) in such a way hat the new covariance matrix will be diagonal.

(18)

Principal component analysis

• Principal component analysis (PCA) is a technique that is useful for the compression of data.

• The purpose is to reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables, that nonetheless retains most of the sample’s information.

• The first principal component corresponds to the normalized eigenvector with the highest variance.

• In principal component analysis (PCA), the significant eigenvectors define the principal components.

(19)

• Accordingly to the Kaiser criterion, the eigenvectors whose eigenvalues are below 1 are discarded

• Each of the s non-discarded eigenvectors is a column of the matrix W of dimension s × m with the linear mapping from

• The Principal component analysis for the data set Σ is given by

(20)
(21)
(22)

• Suppose we have a covariance matrix

• What is the corresponding matrix of the K-L transformation?

• First, we have to compute the eigenvalues.

• The system has to become linear depend-able (singular).

• The determinant has to become zero.

(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)

Problems

Principal components are linear transformation of the original features

It is difficult to attach any semantic meaning to principal components

For new data which is added to the dataset, the PCA has to be recomputed

(31)

PCA: Only the images of three

Eigenvalues

Projection of the Eigenvectors, blue positive values, yellow negative values

(32)

• A comparison of PCA to Fisher’s linear discriminant

(33)

Singular Value Decomposition

• We can use SVD to perform PCA

SVD is more numerically stable if the columns are close to collinear

Factorize a Covariance Matrix A:=C

Difference: compute the eigenvectors out of C CT= CT C=C C, use U as before….

(34)

SVD

(35)
(36)
(37)

Independent-Components Analysis ICA

(38)
(39)

The Blind Source Separation Problem

(40)
(41)

• There is a number of “source signals”:

• Due to some external circumstances,

only linear mixtures of the source signals are observed:

• Estimate (separate) original signals!

(42)

• Use information on statistical independence to recover:

(43)

Motivation

Two Independent Sources Mixture at two Mics

aIJ ... Depend on the distances of the microphones from the speakers 2

22 1

21 2

2 12 1

11 1

) (

) (

s a s

a t

x

s a s

a t

x

+

=

+

=

(44)
(45)
(46)

The Cocktail Party Problem

(47)

PCA vs ICA

(48)

Assumptions

(49)

Assumptions

(50)

Example

(51)

Example

slope is +1

slope is -2

(52)
(53)

Algorithms

• ICA Algorithms rooted in minimization of mutual information

(54)
(55)

Cumulative Distribution Function

(56)

Natural Gradient Learning for ICA

(57)
(58)
(59)

Relative Entropy

(60)
(61)
(62)
(63)
(64)

Learning Rule

(65)

Learning Rule

(66)
(67)

Kernel PCA

(68)

Other Methods for Dimension Reduction

• Unsupervised Learning

Data: no labels!

Goal: Learn the structure of the data

• Traditionally, autoencoders were used for dimensionality

reduction or feature learning.

(69)
(70)

Kohonen Maps

(71)

Literature

• Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer 2006

Chapter 12

• Simon O. Haykin, Neural Networks and Learning Machine, (3rd Edition), Pearson 2008

Chapter 10

(72)

Literature

• Machine Learning - A Journey to Deep Learning, A.

Wichert, Luis Sa-Couto, World Scientific, 2021

Chapter 15

• Intelligent Big Multimedia Databases, A. Wichert, World Scientific, 2015

Chapter 3, Section 3.3

Referências

Documentos relacionados

181. 265) Um burriqueiro conduzia um burro e uma mula carregados. O burro, enquanto estava em terreno plano, aguentava o peso, porém, tendo chegado a uma colina, como não