• Nenhum resultado encontrado

Minimum Classification Error Principal Component Analysis

N/A
N/A
Protected

Academic year: 2019

Share "Minimum Classification Error Principal Component Analysis"

Copied!
20
0
0

Texto

(1)

Universidade Federal Rural de Pernambuco

Minimum Classification Error

Principal Component Analysis

Tiago de Carvalho1, Maria Sibaldo1,

I. R. Tsang2, George Cavalcanti2

1Universidade Federal Rural de Pernambuco - UFRPE 2Universidade Federal de Pernambuco - UFPE

KDMILE

(2)

Introduction

• Principal Component Analysis (PCA)

• Unsuperviseddimensionality reduction

• Used insupervised tasks(face recognition, text classification)

• Supervised PCA for classification (Barshan et al. 2011)

• uses class representatives

• Supervised PCA for regression (Blair et al. 2006)

• pre-processing of feature selection

• Bayesian approach for classification

• depends on the covariance matrix as in PCA

• allows to estimate error rate from features

Research Objective

• propose a new supervised PCA that selects projections that

(3)

Notation

Dataset matrix withnpoints and dfeatures.

X=                  xT 1 xT 2 .. . xT n                  . (1)

Thej-th point

xj =                 

xj1 xj2 .. . xjd                  , (2)

forj=1, . . . ,n.

The data mean vector is

¯

x = n−1 n X

j=1

xj. (3)

The data centered matrix is

X =                 

(x1x)¯ T (x2x)¯ T

.. . (xn−x)¯ T

(4)

Feature Extraction with PCA

The covariance matrix ofXis

ΣX= 1 nX

TX. (5)

• ξi is an eigenvector ofΣX, fori=1, . . . ,k.

Ek = [ξ1. . .ξk], (6)

• k=1, . . . ,d.

• kis the number of extracted features.

Thei-th extracted feature is

fi = [w1i. . .wni]T = Xξi. (7)

The projection of the pointxjis

wTj = [wj1. . .wjk]= xTj Ek. (8)

• λi is the eigenvalue ofξi

• λi is the variance offi

(5)

PCA Projected Data

The new data matrix is

W=XEk, (9)

The covariance matrix ofWisΣW =n−1WTW

ΣW=                 

λ1 0 . . . 0 0 λ2 . . . 0

..

. ... . .. ... 0 0 . . . λk

                 Meaning:

• extracted variable are uncorrelated

• it allows ignoring feature interactions, for feature selection

(6)

Bayes Error Rate

Probability of the classification error.

Simplified with five restrictions:

(1) The data presents a multivariate normal distribution. (2) The problem has only two classes.

(3) Equal prior probabilities for both classes.

(4) Both classes have the same covariance matrix (as in PCA). (5) The features are independent (similarly to PCA).

Then the Bayes error rate is given by

P(error)= 1 2π

Z ∞

r/2

(7)

Minimizing Bayes Error Rate

The Bayes error rate decreases asrincreases, the Mahalanobis distance between the mean vectors of the classes (µ1 andµ2):

r2 =1µ2)Σ−1(µ1µ2). (11)

For diagonal covariance matrix

r=

v t d

X

i=1

µ1i−µ2i σi

!2

, (12)

σi is the standard deviation of the featureithat is the same for both classes.

(8)

Proposed Score

Choose projections (eigenvectors) that minimize Bayes error rate instead of maximize variance.

The mean of thei-th feature for thec-th class (c= 1,2) is

¯ wci=

Pn

j=1wijδjc Pn

j=1δjc

, (13)

δjc= 1if thej-point belongs to thec-th class, andδjc =0, otherwise.

si = (

|w1¯ i−w2¯ i|/λi, ifλi , 0 0, ifλi = 0

, (14)

(9)

Proposed Method

PCA projected features selected according to the proposed score minimize the Bayes error rate.

The proposed method consists in the following steps:

(1) Project the data asW=XEd

(2) Compute the mean for each class

(3) Compute scores of relevance (si)

(4) Selectkfeatures with the highest score

(5) Define the projection matrix with the eigenvectors of selected features

Sk =[ξ1. . .ξk] (15)

(6) Project data

(10)

Experiments

Datasets(UCI Machine Learning Repository):

• The Climate Model Simulation Crashes (540 points, 18 features)

• Banknote Authentication (1,372 points, 4 features). Metric: Accuracy.

Sampling: 100 holdouts (50% training / 50% testing). Classifiers:

• The 1-NN (Nearest Neighbor) with Euclidean distance

• Naive Bayes with normal kernel smoothing density estimate

• Pruned Decision Tree with Gini’s diversity index and a minimum of 10 nodes per leaf

(11)

Banknote (1-NN)

Accuracy for 2 features: 0.852 (PCA) and 0.959 (Proposed).

1 2 3 4

0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Number of extracted features

Accur

acy

(12)

Banknote (Decision Tree)

Accuracy for 2 features: 0.820 (PCA) and 0.947 (Proposed).

1 2 3 4

0.70 0.75 0.80 0.85 0.90 0.95 1.00

Number of extracted features

Accur

acy

(13)

Banknote (Naive Bayes)

Accuracy for 1 feature: 0.695 (PCA) and 0.891 (Proposed).

1 2 3 4

0.70 0.75 0.80 0.85 0.90 0.95 1.00

Number of extracted features

Accur

acy

(14)

Banknote (Linear Discriminant)

Accuracy for 1 feature: 0.614 (PCA) and 0.886 (Proposed).

1 2 3 4

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Number of extracted features

Accur

acy

(15)

Climate (1-NN)

Accuracy for 10 features: 0.873 (PCA) and0.901(Proposed).

1 3 5 7 9 11 13 15 17 19

.840 .850 .860 .870 .880 .890 .900

Number of extracted features

Accur

acy

(16)

Climate (Naive Bayes)

Accuracy for 10 features: 0.916 (PCA) and0.922(Proposed).

1 3 5 7 9 11 13 15 17 19

.914 .916 .918 .920 .922 .924

Number of extracted features

Accur

acy

(17)

Climate (Decision Tree)

Accuracy for 4 features: 0.877 (PCA) and 0.887 (Proposed).

1 3 5 7 9 11 13 15 17 19

.860 .865 .870 .875 .880 .885 .890 .895

Number of extracted features

Accur

acy

(18)

Climate (Linear Discriminant)

Accuracy for 12 features: 0.923 (PCA) and 0.944 (Proposed).

1 3 5 7 9 11 13 15 17 19

.910 .920 .930 .940 .950

Number of extracted features

Accur

acy

(19)

Hypothesis test

The proposed method have accuracy significantly higher than PCA:

Climate

• from 2 to 16 extracted features (1-NN)

• from 3 to 11 extracted features (Naive Bayes)

• from 2 to 4 extracted features (Decision Tree)

• from 4 to 16 extracted features (Linear Discriminant) Banknote

(20)

Conclusion

Selected features in PCA are the ones of highest eigenvalues (λi) and the selected features in the proposed method are the ones with the highest discriminant score (si).

Proposed method have higher accuracy than PCA for a smaller number of features.

Future work:

• extend for more than 2 classes

Referências

Documentos relacionados

Ao Dr Oliver Duenisch pelos contatos feitos e orientação de língua estrangeira Ao Dr Agenor Maccari pela ajuda na viabilização da área do experimento de campo Ao Dr Rudi Arno

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

The proposed methods are checked by a thorough analysis of each spiked sample and the results are compiled in table 3 .The accuracy and reliability of the proposed method are

In addition, 5ZP highest occupied molecular orbital HF energies of some diatomic molecules are evaluated and compared with the corresponding ones obtained with the cc-pV5Z and

The principal component analysis (PCA) distinguished samples with higher content of chia by means of PC1, and PC2 separated the formulations with the highest levels of azuki from

O diagnóstico das patologias associadas ao ferro passa pelo hemograma, uma ferramenta essencial, e por testes séricos, como ferro sérico, saturação da transferrina, receptor solúvel

For changes possible in this context, some fundamental ones are the implementation of the guiding principles proposed by the PNEP-SUS and the recognition of the place occupied by

Figura 13 - Relação entre peso total-comprimento máximo da concha e o peso edível -comprimento máximo: A) Relação peso total-comprimento máximo da concha P. candei; B) Relação