• Nenhum resultado encontrado

Para melhor avaliaras vantagens do ACPH, ´e necess´ario analisar as regi˜oes ho- mogˆeneas e heterogˆeneas individualmente. A tabela 4.3 mostra todas as regi˜oes heterogˆeneas encontradas ao executar o ACPH para os bancos de dados sint´eticos. Na maioria dos casos, `a medida em que a propor¸c˜ao da classe A se distancia de 50%, e, consequentemente, a propor¸c˜ao de uma das classes se torna consideravel- mente alta, a acur´acia obtida pelo ACPH se aproxima da acur´acia obtida pelo kNN e pelo SVM. Por outro lado, `a medida em que a propor¸c˜ao da classe A se apro- xima de 50%, a acur´acia do ACPH tende a ser pior quando comparado aos demais algoritmos.

Tabela 4.3: An´alise de Regi˜oes Heterogˆeneas: caracter´ısticas e confian¸ca fora da amostra considerando apenas pontos pertencentes a cada regi˜ao. ACPH se refere ao algoritmo de classifica¸c˜ao por particionamento hier´arquico. kNN foi aplicado com k =√N ´umerodeObserva¸c˜oesdeEntrada.

An´alise de Confian¸ca Fora da Amostra em Regi˜oes Heterogˆeneas em Bancos Sint´eticos

Caracter´ısticas Confian¸ca

da Regi˜ao Fora da Amostra (%)

Bancos Propor¸c˜ao Classe A N´umero de Obs. N´umero de Obs. ACPH SVM KNN Dentro da Amostra (%) Dentro da Amostra Fora da Amostra

Caso 1 13.3 45 17 82.3 88.2 88.2 32.4 34 2 0.0 100.0 50.0 35.1 37 9 55.6 88.9 88.9 51.5 33 10 40.0 90.0 100.0 93.1 29 4 100.0 100.0 100.0 Caso 2 17.1 70 25 76.0 76.0 72.0 70.1 107 29 75.9 75.9 72.4 81.5 92 16 87.5 81.2 87.5 87.1 101 28 82.1 78.6 82.1 87.5 16 8 62.5 87.5 87.5 Caso 3 12.4 97 26 80.8 88.5 80.8 14.6 48 13 46.1 46.1 46.1 30.3 89 22 54.5 59.1 59.1 47.2 144 35 68.6 85.7 82.9 92.1 114 31 90.3 90.3 90.3 96.8 63 13 100.0 92.3 100.0 97.6 41 6 100.0 100.0 100.0 Caso 4 4.4 68 19 100.0 100.0 100.0 22.4 58 16 62.5 75.0 62.5 62.5 24 4 75.0 75.0 75.0 65.4 52 15 66.7 66.7 66.7 69.7 66 17 58.8 58.8 58.8 70.0 50 14 71.4 92.9 100.0 81.0 58 9 100.0 100.0 100.0 87.8 49 15 66.7 66.7 66.7 94.4 18 4 100.0 100.0 100.0

Esse resultado ´e consistente com s ideia de que as regi˜oes com propor¸c˜oes de classe pr´oximas a 50% s˜ao, de fato, regi˜oes de maior confus˜ao e usar apenas a propor¸c˜ao local para classifica¸c˜ao pode n˜ao ser suficiente para classificar qualquer observa¸c˜ao que perten¸ca a essas regi˜oes. Al´em disso, os resultados da tabela 4.3 tamb´em mos- tram especificamente as regi˜oes em que os demais algoritmos tˆem mais dificuldade

em classificar, o que confere maior interpretabilidade n˜ao s´o ao m´etodo proposto, mas tamb´em a outros m´etodos de classifica¸c˜ao que podem vir a ser utilizados em conjunto.

Como os bancos de dados sint´eticos s˜ao bidimensionais, esses podem ser me- lhor analisados a partir de gr´aficos. Considerando a tabela 4.3 e as demais regi˜oes homogˆeneas por ela n˜ao-representadas, as figuras 4.1,4.2,4.3 e 4.4 apresentam a pro- por¸c˜ao da classe A referente `as diversas regi˜oes encontradas pelo ACPH nos quatro casos sint´eticos, em que regi˜oes mais pretos tˆem alta propor¸c˜ao de classe A e regi˜oes mais pretas tˆem alta propor¸c˜ao de classe B. Assim, uma observa¸c˜ao que pertence a regi˜oes mais alaranjadas, segundo as figuras 4.1,4.2,4.3 e 4.4, possuem probabi- lidades similares de ser classificada como A ou B e, portanto, pertencem a regi˜oes de confus˜ao e dif´ıcil classifica¸c˜ao. Nesse caso, essas regi˜oes alaranjadas s˜ao as mes- mas regi˜oes descritas pela tabela 4.3, logo essa tabela ´e de extrema importˆancia na interpreta¸c˜ao dos resultados em bancos de alta dimens˜ao.

Figura 4.1: Representa¸c˜ao visual das regi˜oes encontradas para o caso sint´etico n´umero 1. Aqui, a classe A ´e dada pelos pontos pretos e a propor¸c˜ao dessa classe ´e representada conforme a barra de cores.

Figura 4.2: Representa¸c˜ao visual das regi˜oes encontradas para o caso sint´etico n´umero 2. Aqui, a classe A ´e dada pelos pontos pretos e a propor¸c˜ao dessa classe ´e representada conforme a barra de cores.

Figura 4.3: Representa¸c˜ao visual das regi˜oes encontradas para o caso sint´etico n´umero 3. Aqui, a classe A ´e dada pelos pontos pretos e a propor¸c˜ao dessa classe ´e representada conforme a barra de cores.

Figura 4.4: Representa¸c˜ao visual das regi˜oes encontradas para o caso sint´etico n´umero 4. Aqui, a classe A ´e dada pelos pontos pretos e a propor¸c˜ao dessa classe ´e representada conforme a barra de cores.

Ademais, dadas as regi˜oes encontradas pelo algoritmo aqui proposto, ´e razo´avel que se tente inferir a respeito dos valores dos atributos necess´arios para que uma observa¸c˜ao perten¸ca a uma regi˜ao e n˜ao a outra. A tabela 4.4 apresenta as regi˜oes e seus intervalos de valores de atributos para o caso 1 dos bancos sint´eticos. O m´ınimo e o m´aximo de cada atributo em cada regi˜ao permite maior conhecimento a respeito de uma nova observa¸c˜ao sem que essa passe pelo processo de classifica¸c˜ao. Nesse caso, considerando as regi˜oes na tabela 4.4, se a observa¸c˜ao tem atributo 1 maior que 1.57 e atributo 2 maior que −1.141, ent˜ao ela certamente ter´a confian¸ca de classifica¸c˜ao maior que 51.5%.

Tabela 4.4: Descri¸c˜ao das regi˜oes encontradas para o banco sint´etico Caso 1 con- siderando prot´otipo representante, classe majorit´aria local, confian¸ca e m´ınimo e m´aximo de cada atributo.

Intervalos de Valores de Atributos das Regi˜oes para o Caso 1

Prot´otipo

Classe Local Confian¸ca (%) Atributo 1 Atributo 2 da Regi˜ao M´ınimo M´aximo M´ınimo M´aximo (-2.599, -3.297) A 51.5 -4.428 1.57 -4.592 -1.141 (0.112, 3.045) B 64.9 -3.883 +∞ 1.804 +∞ (-2.63, -5.783) B 67.6 -4.428 +∞ -4.592 +∞ (-0.623, -0.818) B 86.7 -3.883 +∞ -4.592 +∞ (-4.571, -2.323) A 93.1 -4.428 +∞ -4.517 +∞ (-2.594, 0.968) A 99.2 -2.042 +∞ -0.957 +∞ (3.056, -4.068) B 100 0.435 +∞ -5.892 +∞ (-2.120, -9.349) B 100 -3.666 +∞ -9.591 +∞ (-6.134, -6.973) A 100 -14.322 +∞ -7.382 +∞ (-0.132, -3.858) B 100 -1.313 +∞ -26.331 +∞ (-2.363, 4.877) A 100 -31.433 +∞ 2.080 +∞ (0.959, 0.315) B 100 -0.237 +∞ -6.472 +∞ (-5.239, -3.999) A 100 -4.987 +∞ -4.770 +∞ (1.469, 8.686) B 100 -19.641 +∞ 5.496 +∞ (-1.596, -4.695) B 100 -2.669 1.308 -5.771 -3.467 (-3.045, -1.280) A 100 -3.805 -1.533 -2.561 1.076 (-1.393, 2.694) A 100 -2.939 +∞ 0.619 +∞ (-4.238, -0.983) A 100 -4.103 +∞ -1.713 +∞ (0.269, -1.736) B 100 -0.661 +∞ -5.872 +∞ (-0.816, -6.899) B 100 -1.842 +∞ -6.551 +∞ (0.579, 2.791) B 100 -0.508 22.093 -5.301 3.490 (-0.554, -5.829) B 100 -1.755 +∞ -4.784 +∞ ´

E preciso atentar, entretanto, que essa informa¸c˜ao permite verificar apenas a n˜ao-pertinˆencia da observa¸c˜ao em uma regi˜ao e n˜ao ´e suficiente para determinar a que regi˜ao especificamente ela pertence. Por exemplo, uma inferˆencia ingˆenua con- siste em assumir que ter o atributo 1 dentro do intervalo [−4.428, 1.57] e o atributo 2 dentro do intervalo [−4.592, −1.141] ´e suficiente para que uma observa¸c˜ao perten¸ca `

as regi˜oes fossem retangulares, o que n˜ao ocorre. Na pr´atica, o algoritmo de agru- pamento k-m´edias divide o conjunto de observa¸c˜oes em pol´ıgonos cuja quantidade de lados ´e dependente do n´umero de grupos k e, nesse caso, as regi˜oes, em geral, n˜ao s˜ao retangulares. H´a alguns algoritmos capazes de retornar os v´ertices dessas regi˜oes dados os respectivos prot´otipos, mas conhecˆe-los n˜ao permite a verifica¸c˜ao imediata da pertinˆencia da observa¸c˜ao `a regi˜ao. Nesse sentido, conhecer os valores de m´ınimo e m´aximo para cada atributo se torna vantajoso na medida em que ante- cipa, em alguns casos, informa¸c˜oes a respeito da confian¸ca da classifica¸c˜ao de novas observa¸c˜oes.

Cap´ıtulo 5

Conclus˜oes

Esta disserta¸c˜ao propˆos um algoritmo de classifica¸c˜ao que utiliza particionamento hier´arquico supervisionado para classifica¸c˜ao local de novas observa¸c˜oes. Tem como principal objetivo discriminar regi˜oes de f´acil classifica¸c˜ao e regi˜oes de confus˜ao. Trata-se de um algoritmo com passos intuitivos: uma regi˜ao em que todos os pon- tos, ou quase todos, possuem a mesma classe ´e intuitivamente uma regi˜ao de f´acil classifica¸c˜ao; enquanto regi˜oes de dif´ıcil classifica¸c˜ao a princ´ıpio, quando possuem distribui¸c˜oes de clases bem distintas, claramente possuem alguma subrregi˜ao ho- mogˆenea para uma das classes.

A partir dos experimentos realizados, foi poss´ıvel notar que o Algoritmo de Clas- sifica¸c˜ao por Particionamento Hier´arquico (APCH) se mostrou bastante competi- tivo quando comparado a outros algoritmos comumente usados para esse tipo de aplica¸c˜ao, como o kNN e o SVM. Embora o kNN tenha apresentado melhor desem- penho classificat´orio na maioria dos casos, o APCH n˜ao s´o apresentou acur´acia e desempenho muito pr´oximos, como tamb´em permitiu uma an´alise mais detalhada dos resultados fora da amostra. Enquanto o kNN retorna apenas informa¸c˜ao quanto aos vizinhos das observa¸c˜oes de entrada e o SVM retorna apenas o hiperplano de fronteira, o ACPH fornece, al´em da acur´acia local, informa¸c˜oes sobre todas as vizi- nhan¸cas encontradas.

Para amostras com altas dimens˜oes, a visualiza¸c˜ao e compreens˜ao de suas dis- tribui¸c˜oes s˜ao, em geral, complexas e podem exigir o uso de t´ecnicas de redu¸c˜ao de dimensionalidade. Nesses casos, em que gr´aficos s˜ao invi´aveis, tanto o kNN quanto o SVM n˜ao conseguem prover informa¸c˜oes al´em da classe estimada, ou propor¸c˜ao de vizinhos da observa¸c˜ao no caso do kNN.

O algoritmo de classifica¸c˜ao por particionamento hier´arquico ´e capaz de iden- tificar as regi˜oes em que h´a mais confian¸ca na classifica¸c˜ao e regi˜oes em que essa confian¸ca ´e reduzida. Sua sa´ıda permite compreender e avaliar como os dados de entrada est˜ao dispostos no espa¸co em rela¸c˜ao `as sobreposi¸c˜oes das classes. Tamb´em permite extrair outros tipos de informa¸c˜ao que enriquecem o processo de classi-

fica¸c˜ao, como por exemplo os intervalos de atributos e os valores de confian¸ca de cada regi˜ao. Al´em disso, seu processo intr´ınseco de agrupamento supervisionado permite analisar como outros algoritmos de clasifica¸c˜ao se comportam localmente. Os resultados do kNN e do SVM em regi˜oes homogˆeneas e heterogˆeneas s´o foram obtidos porque o ACPH forneceu a estrutura de regi˜oes. Portanto, o algoritmo de classifica¸c˜ao por particionamento hier´arquico n˜ao s´o pode ser aplicado como classi- ficador, mas tamb´em como uma ferramenta para avaliar outros m´etodos de maneira descritiva e detalhada.

Referˆencias Bibliogr´aficas

[1] KOIKKALAINEN, J., P ¨OL ¨ONEN, H., MATTILA, J., et al. “Improved Classifi- cation of Alzheimer’s Disease Data via Removal of Nuisance Variability”, PLoS ONE, v. 7, n. 2, pp. e31112, fev. 2012. ISSN: 1932-6203. doi: 10.1371/journal.pone.0031112. Dispon´ıvel em: <http://dx.plos.org/ 10.1371/journal.pone.0031112>.

[2] SOMASUNDARAM, G., SIVALINGAM, R., MORELLAS, V., et al. “Classifi- cation and Counting of Composite Objects in Traffic Scenes Using Glo- bal and Local Image Analysis”, IEEE Transactions on Intelligent Trans- portation Systems, v. 14, n. 1, pp. 69–81, mar. 2013. ISSN: 1524-9050, 1558-0016. doi: 10.1109/TITS.2012.2209877. Dispon´ıvel em: <http: //ieeexplore.ieee.org/document/6291788/>.

[3] BIN ZHANG, MARIN, A., HUTCHINSON, B., et al. “Learning Phrase Pat- terns for Text Classification”, IEEE Transactions on Audio, Speech, and Language Processing, v. 21, n. 6, pp. 1180–1189, jun. 2013. ISSN: 1558-7916, 1558-7924. doi: 10.1109/TASL.2013.2245651. Dispon´ıvel em: <http://ieeexplore.ieee.org/document/6457440/>.

[4] WILLEMS, L. L., VANHOUCKE, M. “Classification of articles and journals on project control and earned value management”, International Journal of Project Management, v. 33, n. 7, pp. 1610–1634, out. 2015. ISSN: 02637863. doi: 10.1016/j.ijproman.2015.06.003. Dispon´ıvel em: <http: //linkinghub.elsevier.com/retrieve/pii/S026378631500099X>. [5] BISHOP, C. M. Pattern recognition and machine learning. Information science

and statistics. New York, Springer, 2006. ISBN: 978-0-387-31073-2. [6] PERES, R., PEDREIRA, C. “A new local–global approach for classifica-

tion”, Neural Networks, v. 23, n. 7, pp. 887–891, set. 2010. ISSN: 08936080. doi: 10.1016/j.neunet.2010.04.010. Dispon´ıvel em: <http: //linkinghub.elsevier.com/retrieve/pii/S0893608010000936>.

[7] ABU-MOSTAFA, Y. S., MAGDON-ISMAIL, M., LIN, H.-T. Learning from data: a short course. S.l., AMLbook.com, 2012. ISBN: 978-1-60049-006- 4. OCLC: 808441289.

[8] DUDA, R. O., HART, P. E., STORK, D. G. Pattern classification. 2nd ed ed. New York, Wiley, 2001. ISBN: 978-0-471-05669-0.

[9] GOU, J., ZHAN, Y., RAO, Y., et al. “Improved pseudo nearest neighbor classi- fication”, Knowledge-Based Systems, v. 70, pp. 361–375, nov. 2014. ISSN: 09507051. doi: 10.1016/j.knosys.2014.07.020. Dispon´ıvel em: <http: //linkinghub.elsevier.com/retrieve/pii/S0950705114002779>. [10] PAN, Z., WANG, Y., KU, W. “A new general nearest neighbor classi-

fication based on the mutual neighborhood information”, Knowledge- Based Systems, v. 121, pp. 142–152, abr. 2017. ISSN: 09507051. doi: 10.1016/j.knosys.2017.01.021. Dispon´ıvel em: <http://linkinghub. elsevier.com/retrieve/pii/S0950705117300333>.

[11] CALVO-ZARAGOZA, J., VALERO-MAS, J. J., RICO-JUAN, J. R. “Im- proving kNN multi-label classification in Prototype Selection scenarios using class proposals”, Pattern Recognition, v. 48, n. 5, pp. 1608– 1622, maio 2015. ISSN: 00313203. doi: 10.1016/j.patcog.2014.11. 015. Dispon´ıvel em: <http://linkinghub.elsevier.com/retrieve/ pii/S0031320314004853>.

[12] MUSAVI, M., AHMED, W., CHAN, K., et al. “On the training of ra- dial basis function classifiers”, Neural Networks, v. 5, n. 4, pp. 595– 603, jul. 1992. ISSN: 08936080. doi: 10.1016/S0893-6080(05)80038-3. Dispon´ıvel em: <http://linkinghub.elsevier.com/retrieve/pii/ S0893608005800383>.

[13] HAYKIN, S. S. Neural networks: a comprehensive foundation. 2nd ed ed. Upper Saddle River, N.J, Prentice Hall, 1999. ISBN: 978-0-13-273350-2. [14] CHENPING HOU, FEIPING NIE, DONGYUN YI, et al. “Discriminative

Embedded Clustering: A Framework for Grouping High-Dimensional Data”, IEEE Transactions on Neural Networks and Learning Systems, v. 26, n. 6, pp. 1287–1299, jun. 2015. ISSN: 2162-237X, 2162-2388. doi: 10.1109/TNNLS.2014.2337335. Dispon´ıvel em: <http://ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6867384>.

[15] ZHANG, Z., JIA, L., ZHANG, M., et al. “Discriminative clustering on ma- nifold for adaptive transductive classification”, Neural Networks, v. 94,

pp. 260–273, out. 2017. ISSN: 08936080. doi: 10.1016/j.neunet.2017.07. 013. Dispon´ıvel em: <http://linkinghub.elsevier.com/retrieve/ pii/S0893608017301764>.

[16] LEITE, E. V. C. Particionamento Dinˆamico para Algoritmo Local de Classi- fica¸c˜ao. Disserta¸c˜ao de M.Sc., Universidade Federal do Rio de Janeiro, UFRJ.

[17] PRINCIPE, J. C. Information Theoretic Learning. Information Science and Statistics. New York, NY, Springer New York, 2010. ISBN: 978-1-4419- 1569-6 978-1-4419-1570-2. Dispon´ıvel em: <http://link.springer. com/10.1007/978-1-4419-1570-2>. DOI: 10.1007/978-1-4419-1570-2. [18] KAMPA, K., HASANBELLIU, E., PRINCIPE, J. C. “Closed-form cauchy-

schwarz PDF divergence for mixture of Gaussians”. pp. 2578–2585. IEEE, jul. 2011. ISBN: 978-1-4244-9635-8. doi: 10.1109/IJCNN.2011.6033555. Dispon´ıvel em: <http://ieeexplore.ieee.org/document/6033555/>. [19] XU, D. Energy, Entropy and Information Potential for Neural Computation.

Ph.D. thesis, University of Florida, Gainesville, FL, USA, 1999.

[20] PERES, R. T., ARANHA, C., PEDREIRA, C. E. “Optimized bi-dimensional data projection for clustering visualization”, Information Sciences, v. 232, pp. 104–115, maio 2013. ISSN: 00200255. doi: 10.1016/j.ins.2012.12. 041. Dispon´ıvel em: <http://linkinghub.elsevier.com/retrieve/ pii/S0020025513000339>.

[21] SILVERMAN, B. W. Density estimation for statistics and data analysis. N. 26, Monographs on statistics and applied probability. Boca Raton, Chapman & Hall/CRC, 1998. ISBN: 978-0-412-24620-3.

[22] SCOTT, D. W. Multivariate density estimation: theory, practice, and visuali- zation. Second edition ed. Hoboken, New Jersey, John Wiley & Sons, Inc, 2015. ISBN: 978-0-471-69755-8.

[23] JENSE, R. An Information Theoretic Approach to Machine Learning. Ph.D. thesis, Faculty of Science, Department of Physics, University of Tromso, Tromso, Norway, 2005.

[24] JAMES, G., WITTEN, D., HASTIE, T., et al. An Introduction to Statistical Le- arning, v. 103, Springer Texts in Statistics. New York, NY, Springer New York, 2013. ISBN: 978-1-4614-7137-0 978-1-4614-7138-7. Dispon´ıvel em: <http://link.springer.com/10.1007/978-1-4614-7138-7>. DOI: 10.1007/978-1-4614-7138-7.

Apˆendice A

Artigo Submetido a Peri´odico

A Hierarchical Partitioning Algorithm for Classifi-

cation

L.M. da Costaa, E.V.C. Leitea, R.T. Peresb, C.E. Pedreiraa,∗1

aCOPPE – PESC – Systems Engineering and Computer Science Program, Fe-

deral University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil

bMathematics Department (DEMAT), Federal Center of Technological Educa-

tion of Rio de Janeiro (CEFET/RJ)

Abstract: We propose a new classification approach that takes advantages of both supervised and unsupervised learning. The aforesaid algorithm, we name Hierarchical Partitioning Algorithm (HPA), generates interpretable outputs in the sense that not only the class labels but also indications of certainty for different regions in the attributes space are provided. The HPA builds up partitions in an iterative way without the need of previous determining the number of prototypes representing these partitions. The k-means algorithm was employed as an unsuper- vised stage before applying a local supervised technique. The key idea is to cluster observations aiming to revel easy (and hard) to classify regions in the attributes space. The HPA has shown to be highly competitive when compared to commonly used classification algorithms like the kNN and the SVM, with the advantage of successfully selecting almost homogeneous regions while isolating regions where classification is difficult.

Keywords: Classification, Local-Global, Prototype, kNN, Cauchy-Schwarz, Clustering, Hierarchical Partitioning

1* Corresponding author.Corresponding address: UFRJ, COPPE-PESC, Av. Hor´acio Macedo,

Centro de Tecnologia (CT), Bloco H, 3oandar, Ilha do Fund˜ao, CEP 21941-914, Rio de Janeiro, RJ,

Brazil.E-mail addresses: lygia.marina@gmail.com (L.M. da Costa), evcleite@gmail.com (E.V.C. Leite), rt.peres25@gmail.com (R.T. Peres), pedreira56@gmail.com (C.E. Pedreira)

Introduction

Classification methods may be seen through a Local-Global perspective. A pure global approach assumes that data is engendered by a phenomenon governed by a global fundamental law and does not take advantage of possible local generative structures (Peres and Pedreira, 2010). Global models aim to represent the entire space taken as a whole. In contrast, local models are a composition of sectional classification schemes that use specific subsets of the sample. Accordingly, when global methods are applied, the entire attribute space is equally treated, ignoring possible local characteristics in certain regions of this space.

One of the potential important benefits of introducing local information is the possibility of determining sub regions where the classifier is more (or less) likely to find the correct label. A local approach may take advantage of multiple local predictive models applied to different regions in the attribute space. Therefore, it is assumed that different groups of observations may be governed by different local distributions.

A well-known local tool that can be used for classifications the k-nearest neighbor (kNN) (Duda et al., 2001). It consists in using a set of k labeled observations that are neighbors of the observation one wish to label. A number of successful methods follow this conception, e.g. Gou et al. (2014). Several variations and improvements may be used to refine kNN ideas. For instance, in Pan et al. (2017), a scheme to select nearest neighbors based on a two-sided mode, produced interesting results for small sized datasets. In Calvo-Zaragoza et al. (2015), prototypes selection is deployed aiming to allow a faster kNN classification. Another example of local supervised method is Radial Basis Function Neural Networks (RBFN) (Musavi et al., 1992; Haykin, 1999), in which the set radial functions are spread in the attribute space. The labels are determined taking in account a measure of pertinence of the observation to each of those functions.

Some methods may take advantage from both, local and global approaches. Sup- port vector machine (SVM) (Bishop, 2006), for instance, carries a global aspect in the sense that it generates a unique separator hyperplane, but the way this hyper- plane is defined, is actually a local approach, since only a subset of observations is used. In Zhang et al. (2017), an approach combining unsupervised manifold lear- ning, clustering and adaptive classification is proposed. Clustering is commonly used for unsupervised classification. This is done by partitioning the dataset into diffe- rent subsets, the clusters, using similarity measures. The k-means (Bishop, 2006) is a quite popular clustering algorithm, whose main goal is to iteratively partition ob- servations into k clusters such that each observation belongs to the cluster with the nearest distance to one of the k prototypes.

A local-global approach that combines supervised and unsupervised learning was proposed in Peres and Pedreira (2010). It starts by using unsupervised learning to divide the training data in local regions and then apply a supervised scheme in each of those regions independently. This approach takes advantage from the divide and conquer strategy, in which a complex problem is reduced to a set of much easier problems that are locally solved.

The k-means, or other clustering approach, may be employed as an unsupervised stage a before the application of a supervised technique. The idea is to cluster observations regardless of their class labels and then use a supervised algorithm to perform local classification within each cluster. This segmentation process may produce easy-to-classify clusters, and in the limit, it is possible to find homogenous regions with observations that belong to just one of the classes. On the other hand, the same process would be able to identify difficult-to-classify regions, what of course is also of interest.

In this article, we propose a new classification method that takes advantages from both supervised and unsupervised learning. It provides not only the estimated label for each observation but also an indication of certainty for different regions in the attributes space. Accordingly, besides providing the classification labels, the attribute space is partitioned and a sureness indicator is generated for each of these partitions. Each partition is represented by a prototype. Some of those demarcate almost homogenous regions where one may be well-nigh sure of the result. The proposed algorithm constructs the partitions in an iterative unsupervised way, without the need of previous determining the number of prototypes. In this manner, local and global information are combined to update the iterative process. Due to its iterative and hierarchical aspect, we named our method as Hierarchical Partitioning Algorithm (HPA).

Methodology

Let x={x1, . . . , xN} be a set of N d-dimensional observations to which a clas-

sification label, A or B, is assigned. Each of these observations correspond to d measured quantities we name attributes. The attribute space is quantized by a set of k prototypes L = {V1, . . . , Vi, . . . , Vk}, where prototype Vi represents region Ri,

a subspace of the Rd. We consider that any observation x

j ∈ Ri has a probability

Pi to have a specific class label A and (1 − Pi) to have class label B. Therefore, the

attribute space is fragmented into regions, each of them represented by a prototype

Documentos relacionados