Trabalhos Futuros - Metodos de Agrupamento de Dados Simbolicos Baseados em funções de Dissimi

7.3 Trabalhos Futuros

Existem várias maneiras de estender o trabalho apresentado nesta Disserta¸cão. Uma delas é utilizar outras bases de dados reais. Além disso, o framework pode ser estendido para lidar com variáveis simbólicas modais, em que é associado um peso, freqüência ou probabilidade a cada categoria, indicando quão freqüente, t´ıpica ou relevante é a categoria considerada para o objeto em questão.

Uma avalia¸cão adicional pode levar em considera¸cão a complexidade computacional e o tempo de execu¸cão de cada algoritmo de agrupamento considerado.

Ao framework proposto tamb´em pode ser adicionado um m´odulo de tratamento de dados faltosos.

Uma avalia¸cão adicional pode considerar o tratamento dos intervalos através de um algoritmo de otimiza¸cão. O algoritmo de Fisher (FISHER, 1958) pode ser utilizado para fazer a conversão de intervalos em categorias de forma otimizada. O algoritmo de Fisher é um algoritmo de programa¸cão dinâmica eficiente que minimiza a variância intra-classe (LECHEVALLIER, 1976).

O módulo de avalia¸cão pode ser melhorado com a inclusão de ´ındices de valida¸cão espec´ıficos para algoritmos fuzzy, como por exemplo, os ´ındices apresentados por (CAM- PELLO, 2007).

A abordagem CARD (FRIGUI; HWANG; RHEE, 2007) mostrou-se bastante promissora nos experimentos realizados, principalmente se tratando de bases de dados reais. Essa abordagem pode ser utilizada para estender métodos de agrupamento diferentes dos apresentados neste trabalho. Além disso, podem ser propostas novas medidas de dissimilari- dade que não considerem igual relevância dos atributos.

Finalmente, a metodologia apresentada pode ser utilizada com proveito para a ob- ten¸cão de grupos homogêneos de perfis de usuários no contexto da aplica¸cão da minera¸cão de dados em arquivos log (Web usage Mining - WUM) (COOLEY; SRIVASTAVA; MOBASHER, 1997).

126

Referˆencias

ASUNCION, D. N. A. UCI Machine Learning Repository. 2007. Dispon´ıvel em: <http://www.ics.uci.edu/∼mlearn/MLRepository.html>. Acesso em: 01 de Junho de 2008.

BERKHIN, P. Survey Of Clustering Data Mining Techniques. San Jose, CA, 2002. Dispon´ıvel em: <http://citeseer.ist.psu.edu/berkhin02survey.html>. Acesso em: 01 Junho de 2008.

BERTHOUEX, P. M.; BROWN, L. C. Statistics for environmental engineers. Second edition. Florida, USA: CRC Press, 2002.

BEZDEK, J. C. Pattern Recognition with Fuzzy Objective Function Algoritms. New York: Plenum Press, 1981.

BEZDEK, J. C. et al. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Norwell, MA, USA: Kluwer Academic Publishers, 1999.

BEZERRA, B. L. D.; CARVALHO, F. de A. T. de. A symbolic approach for content-based information filtering. Inf. Process. Lett., Elsevier North-Holland, Inc., Amsterdam, The Netherlands, The Netherlands, v. 92, n. 1, p. 45–52, 2004.

BOBOU, A.; RIBEYRE, F. Mercury in the food web: Accumulation and transfer mechanisms. In: SIGREL, A.; SIGREL, H. (Ed.). Metal Ions in Biological Systems. New York: M. Dekker, 1998. p. 289–319.

BOCK, H. H.; DIDAY, E. Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. New Jersey, USA: Springer-Verlag New York, Inc., 2000.

CAMPELLO, R. J. G. B. A fuzzy extension of the rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters, Elsevier Science Inc., New York, NY, USA, v. 28, n. 7, p. 833–841, 2007.

CARVALHO, F. A. T. de; LECHEVALLIER, Y.; VERDE, R. Symbolic data analysis and the sodas software. In: . West Sussex, England: John Wiley & Sons, Ltd, 2008. cap. 11.

CARVALHO, F. D.; BRITO, P.; BOCK, H.-H. Dynamic clustering for interval data based on l2 distance. Computational Statistics, Kluwer Academic Publishers, Massachusetts, USA, v. 21, n. 2, p. 231–250, 2006.

CARVALHO, F. d. A. T. D. Proximity coefficients between boolean symbolic objects. In: E.DIDAY et al. (Ed.). IFCS-93. Berlin - Heidelberg, Germany: Springer - Verlag, 1994. p. 387–394.

Referˆencias 127

CARVALHO, F. d. A. T. D. Extension based proximity coefficients between boolean constrained symbolic objects. In: HAYASHI, C. et al. (Ed.). IFCS-96. Berlin - Heidelberg, Germany: Springer-Verlag, 1998. p. 370–378.

CARVALHO, F. d. A. T. D. Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognition Letters, v. 28, n. 4, p. 423–437, 2007.

CARVALHO, F. de A. T. D. et al. Adaptive hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognition Letters, v. 27, p. 167–179, 2006.

CELEUX, G. et al. Classification automatique des donn´ees. Paris, France: Dunod, 1989. CHAVENT, M. et al. New clustering methods for interval data. Computational Statistics, Kluwer Academic Publishers, Hingham, MA, USA, v. 21, n. 2, p. 211–229, 2006.

CHAVENT, M.; LECHEVALLIER, Y. Dynamical clustering algorithm of interval data: Optimization of an adequacy criterion based on hausdorff distance. In: SOKOLOWSKY; BOCK, H. (Ed.). Classification, Clustering and Data Analysis. Berlin, Heidelberg, Germany: Springer, 2002. p. 53–59.

COOLEY, R.; SRIVASTAVA, J.; MOBASHER, B. Web mining: Information and pattern discovery on the world wide web. In: ICTAI ’97: Proceedings of the 9th International Conference on Tools with Artificial Intelligence. Washington, DC, USA: IEEE Computer Society, 1997. p. 558.

DICE, L. R. Measures of the amount of ecologic association between species. Journal of Ecology, v. 26, p. 297–302, 1945.

DIDAY, E. La méthode de nuées dynamiques. Revue de Statistique Appliquée, v. 19, n. 2, p. 19–34, 1971.

DIDAY, E. Classification automatique sequentielle pour grands tableaux. Revue Fran¸caise d’Automatique, Informatique et Recherche Operationnelle, p. 29–61, 1975. DIDAY, E.; GOVAERT, G. Classification automatique avec distances adaptatives. Informatique Computer Science, v. 11, n. 4, p. 329–349, 1977.

DIDAY, E.; NOIRHOMME-FRAITURE, M. (Ed.). Symbolic Data Analysis and the SODAS Software. West Sussex, England: John Wiley & Sons, Ltd, 2008.

DIDAY, E.; SIMON, J. Clustering analysis. In: FU, K. S. (Ed.). Data Analysis: Scientific Modeling and Practical Application. berlin - Heidelberg, Germany: Springer-Verlag, 1976. p. 47–94.

DUMAS, F. N. Manifest Structure Analysis. Missoula, Montana: Montana State University Press, 1955.

DUNN, J. C. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics, v. 3, n. 3, p. 32–57, 1974.

DURAN, B. S.; ODELL, P. L. Cluster Analysis. Berlin - Heidelberg, Germany: Springer-Verlag, 1974.

Referˆencias 128

EL-SONBATY, Y.; ISMAIL, M. A. Fuzzy clustering for symbolic data. IEEE Transactions on fuzzy systems, v. 6, n. 2, p. 195–204, 1998.

ESPOSITO, F. et al. Analysis of symbolic data. exploratory methods for extracting statistical information from complex data. In: . Berlin - Heidelberg, Germany: Springer-Verlag. cap. Classical resemblance measures.

FISHER, W. On grouping for maximum homogeneity. Journal of the American Statistical Association, n. 53, p. 789–798, 1958.

FRIGUI, H.; HWANG, C.; RHEE, F. C.-H. Clustering and aggregation of relational data with applications to image database categorization. Pattern Recognition, Elsevier Science Inc., New York, NY, USA, v. 40, n. 11, p. 3053–3068, 2007.

FRIGUI, H.; NASRAOUI, O. Simultaneous clustering and attribute discrimination. In: Proceedings of the IEEE Conference on Fuzzy Systems. Texas, USA: Elsevier Science Inc., 2000. p. 158–163.

FRIGUI, H.; NASRAOUI, O. Unsupervised learning of prototypes and attribute weights. Pattern Recognition, v. 37, n. 3, p. 567–581, 2004.

GOWDA, K. C.; DIDAY, E. Symbolic clustering using a new dissimilarity measure. Pattern Recognition, Elsevier Science Inc., New York, NY, USA, v. 24, n. 6, p. 567–578, 1991.

GOWDA, K. C.; DIDAY, E. Symbolic clustering using a new similarity measure. IEE Transactions on systems, man and cybernetics, v. 22, n. 2, p. 368–378, 1992.

GOWDA, K. C.; RAVI, T. V. Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity. Pattern Recognition, v. 28, n. 8, p. 1277–1282, 1995. GOWDA, K. C.; RAVI, T. V. Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity. Pattern Recognition, v. 28, n. 8, p. 1277–1282, 1995. GREENACRE, M. Theory And Applications of Correspondence Analysis. New York, USA: Academic Press, 1984.

GURU, D. S.; KIRANAGI, B. B.; NAGABHUSHAN, P. Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns. Pattern Recognition Letters, Elsevier Science Inc., New York, NY, USA, v. 25, n. 10, p. 1203–1213, 2004.

HAN, J.; KAMBER, M. Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems). Second edition. San Francisco, USA: Morgan Kaufmann, 2006. Hardcover.

HATHAWAY, R. J.; BEZDEK, J.; PEDRYCZ, W. A parametric model for fusing heterogeneous fuzzy data. IEEE Transactions on Fuzzy Systems, v. 4, p. 270–281, 1996. HATHAWAY, R. J.; BEZDEK, J. C. Nerf c-means : Non- euclidean relational fuzzy clustering. Pattern Recognition, v. 27, n. 3, p. 429–437, 1994.

Referˆencias 129

HATHAWAY, R. J.; DAVENPORT, J. W.; BEZDEK, J. C. Relational duals of the c-means clustering algorithms. Pattern Recognition, v. 22, n. 2, p. 205–212, 1989. HILL, L. R. et al. Automatic classification of staphylococci by principal component analysis and a gradient method. Journal of Bacteriology, v. 89, p. 1393–1401, 1965. HUBERT, L.; ARABIE, P. Comparing partitions. Journal of Classification, v. 2, p. 193–218, 1985.

ICHINO, M.; YAGUCHI, H. Generalized minkowski metrics for mixed feature type data analysis. IEE Transactions on systems, man and cybernetics, v. 24, n. 4, p. 698–708, 1994.

JACCARD, P. Nouvelles recherches sur la distribution florale. Bulletin de la Societe Vaudoise de Sciences Naturelles, v. 44, p. 223–270, 1908.

JAIN, A. K.; DUBES, R. C. Algorithms for clustering data. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988.

JAIN, A. K.; MURTY, M. N.; FLYNN, P. J. Data clustering: A review. ACM Computing Surveys, v. 31, n. 3, p. 264–323, 1999.

JOHNSON, R. A.; WICHERN, D. W. Applied Multivariate Statistical Analysis. Upper Saddle River, New Jersey, USA: Prentice-Hall, 1992.

KAUFMAN, L.; ROUSSEEUW, P. J. Finding Groups in Data: An Introduction to Cluster Analysis. New York, USA: John Wiley, 1990.

KNUTH, D. E. Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd Edition). Reading, Massachusetts, USA: Addison-Wesley Professional, 1997. Hardcover.

KOGAN, J.; NICHOLAS, C.; TEBOULLE, M. Grouping Multidimensional Data: Recent Advances in Clustering. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006. KRISHNAPURAM, R. et al. Low-complexity fuzzy relational clustering algorithms for web mining. IEEE-FS, v. 9, p. 595–607, 2001.

KULCZYNSKI, S. Die pflanzenassoziationen der pieninen. Bulletin de L’Academie Polonaise des Sciences, 1928.

LECHEVALLIER, Y. Classification automatique optimale sous contrainte d’ordre total. Paris, France, 1976.

LECHEVALLIER, Y.; CARVALHO, F. D.; VERDE, R. Clustering methods in symbolic data analysis. July 2006. Workshop Symbolic Data analysis.

MALERBA, D.; MONOPOLI, F. E. M. Comparing dissimilarity measures for probabilistic symbolic objects. In: ZANASI C.A. BREBBIA, N. E. A.; MELLI, P. (Ed.). Data Mining III. England: WIT Press, 2002, (Information and Communication Technologies, v. 6). p. 31–40.

Referˆencias 130

MALERBA F. ESPOSITO, V. G. D.; TAMMA, V. Comparing dissimilarity measures for symbolic data analysis. In: Proceedings of ETK-NTTS 2001. Luxemburg: Eurostat, 2001.

MILLIGAN, G. W. Clustering validation: results and implications for applied analyses. In: . Singapore: World Scientific, 1996. p. 341–375.

OCHIAI, A. Zoogeographic studies on the soleoid fishes found in japan and its neighbouring regions. Bulletin of the japanese society for fish science, v. 22, p. 526–530, 1957.

RALAMBONDRAINY, H. A conceptual version of the k-means algorithm. Pattern Recognition Letters, Elsevier Science Inc., New York, NY, USA, v. 16, n. 11, p. 1147–1157, 1995.

ROGERS, D. J.; TANIMOTO, T. T. A computer program for classifying plants. Science, v. 132, p. 1115–1118, 1960.

ROUBENS, M. Pattern classification problems and fuzzy sets. Fuzzy Sets and Systems, v. 1, p. 239–253, 1978.

RUSSELL, S. J.; NORVIG, P. Artificial Intelligence: A Modern Approach (2nd Edition). [S.l.]: Prentice Hall, 2002. Hardcover.

SILVA, A. C. G. da. Dissimilarity functions analysis based on Dynamic Clustering for Symbolic Data. Disserta¸c˜ao (Mestrado) — Universidade Federal de Pernambuco, 2005. SILVA, K. P.; CARVALHO, F. de A. T. de; CSERNEL, M. Clustering of symbolic data through a dissimilarity volume based measure. In: Proceedings of the International Joint Conference on Neural Networks - IJCNN 2008. Los Alamitos (USA): IEEE Computer Society, 2008. p. 2865–2871.

SNEATH, P. H. A. The application of computer to taxonomy. Journal of General Microbiology, v. 17, p. 201–226, 1957.

SNEATH, P. H. A. The construction of taxonomic groups. In: . Cambridge, England: Cambridge University Press, 1962. p. 289–332.

SOKAL, R. R.; MICHENER, C. D. A statistical method for evaluating systematic relationships. The University of Kansas Scientific Bulletin, v. 38, p. 1409–1438, 1958. SOKAL, R. R.; SNEATH, P. H. A. Principles of Numerical Taxonomy. San Francisco: Freeman, 1963.

SORENSEN, T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analysis of the vegetation of danish commons. Biologiske Skrifter, v. 5, p. 1–34, 1948.

SOUZA, R. M. C. R. de; CARVALHO, F. de A. T. de. Clustering of interval data based on city-block distances. Pattern Recognition Letters, Elsevier Science Inc., New York, NY, USA, v. 25, n. 3, p. 353–365, 2004.

Referˆencias 131

WINDHAM, M. P. et al. Cluster analysis to improve food classification within commodity groups. Journal of the American Dietetic Association, v. 85, n. 10, p. 1306–1314, 1985. WITTEN, I. H.; FRANK, E. Data mining: practical machine learning tools and techniques with Java implementation. New York, USAs: Morgan Kaufman Publishers, 2000.

XU, R.; WUNSCH, D. I. Survey of clustering algorithms. IEEE Transactions on Neural Networks, v. 16, n. 3, p. 645–678, 2005.

XU, R.; Wunsch II, D. Survey of clustering algorithms. IEEE Transactions on Neural Networks, v. 16, n. 3, p. 645–678, 2005.

YANG, M.-S.; HWANG, P.-Y.; CHEN, D.-H. Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets and Systems, v. 141, n. 2, p. 301–317, 2004.

YANG, M.-S.; KO, C.-H. On a class of fuzzy c-numbers clustering procedures for fuzzy data. Fuzzy Sets and Systems, Elsevier North-Holland, Inc., Amsterdam, The Netherlands, v. 84, n. 1, p. 49–60, 1996.

ZADEH, L. Fuzzy sets. Information Control, v. 8, p. 338–353, 1965.

ZUBIN, T. A technique for measuring like-mindedness. Journal of Abnormal and Social Psychology, v. 33, p. 508–516, 1938.

No documento Metodos de Agrupamento de Dados Simbolicos Baseados em funções de Dissimilaridades (páginas 141-147)