• Nenhum resultado encontrado

EXPERIMENTO 1 COM CLASSES BALANCEADAS

8.1 TRABALHOS FUTUROS

Como trabalho futuro para este tema pode-se investigar e utilizar novas abordagens para resoluc¸˜ao do problema proposto como, por exemplo, a utilizac¸˜ao de m´etodos de classi- ficac¸˜ao preparados para lidar com o problema do desbalanceamento diretamente na etapa de construc¸˜ao do modelo de predic¸˜ao (como citado na Sec¸˜ao 4), diferentemente das abordagens utilizadas neste trabalho, onde foram utilizadas t´ecnicas de balanceamento do conjunto de dados com m´etodos de classificac¸˜ao padr˜oes.

Tamb´em como trabalho futuro pode-se estudar e aplicar outros m´etodos Bayesianos que tentam aprender a estrutura da rede a fim de buscar melhores resultados nas predic¸˜oes das fam´ılias em situac¸˜ao de alta vulnerabilidade social.

Outro trabalho futuro pode ser a validac¸˜ao e aplicac¸˜ao real dos resultados obtidos na busca ativa das fam´ılias em situac¸˜ao de vulnerabilidade. Este trabalho pode ser realizado atrav´es da aplicac¸˜ao do modelo de predic¸˜ao em 100% das fam´ılias do munic´ıpio de Cascavel existentes na base de dados do sistema IRSAS. Ap´os a aplicac¸˜ao do modelo ser´a obtida uma lista ordenada das fam´ılias em situac¸˜ao de vulnerabilidade e risco social. Essa lista das fam´ılias poder´a ser entregue para a equipe de t´ecnicos da secretaria de assistˆencia social do munic´ıpio que ent˜ao ir´a visitar as fam´ılias apontadas com maior probabilidade de estarem em alta vulnerabilidade social, verificando se a predic¸˜ao realizada est´a correta e se a fam´ılia realmente se encontra na situac¸˜ao indicada pelo modelo de predic¸˜ao. Desse modo poder´a ser realizada uma aplicac¸˜ao real e pr´atica da t´ecnica estudada a fim de verificar se o modelo ´e capaz de apoiar o processo da busca ativa das fam´ılias em situac¸˜ao de vulnerabilidade e risco social.

REFER ˆENCIAS

ALCALA-FDEZ, J. et al. Keel data-mining software tool: Data set repository and integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, v. 17, n. 2-3, p. 255–287, 2011.

ALCALA-FDEZ, J. et al. Keel: a software tool to assess evolutionary algorithms for data mi- ning problems. Soft Computing, Springer-Verlag, v. 13, n. 3, p. 307–318, 2009.

BATISTA, G.; PRATI, R.; MONARD, M. A study of the behavior of several methods for ba- lancing machine learning training data. SIGKDD Explorations, v. 6, n. 1, p. 20–29, 2004.

BRASIL. Minist´erio do Desenvolvimento Social e Combate `a Fome. Orientac¸˜oes T´ecnicas: Centro de Referˆencia de Assistˆencia Social - CRAS. Bras´ılia: , 2009.

BRASIL. Busca Ativa: O que ´e a busca ativa do plano Brasil sem mis´eria. 2013. Dispon´ıvel em: http://www.mds.gov.br/falemds/perguntas-frequentes/ superacao-da-extrema-pobreza\%20/plano-brasil-sem-miseria-1/ busca-ativa/. Acesso em 20 jun. 2015.

BUNKHUMPORNPAT, C.; SINAPIROMSARAN, K.; LURSINSAP, C. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced pro- blem. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. : Springer-Verlag, 2009. (Lecture Notes on Computer Science, v. 5476), p. 475–482.

CHAWLA, N. et al. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, v. 16, p. 321–357, 2002.

CHICKERING, D. Learning bayesian networks is np-complete. In: FISHER, D.; LENZ, H.-J. (Ed.). Learning from Data. : Springer New York, 1996, (Lecture Notes in Statistics, v. 112). p. 121–130.

COHEN, G. et al. Learning from imbalanced data in surveillance of nosocomial infection. Ar- tificial Intelligence in Medicine, Elsevier, v. 37, n. 1, p. 7–18, 2006.

DOUGHERTY, G. Pattern Recognition and Classification: An Introduction. : Springer, 2013.

FAWCETT, T.; PROVOST, F. Adaptive fraud detection. Data mining and knowledge disco- very, Springer, v. 1, n. 3, p. 291–316, 1997.

FAYYAD, U.; PIATETSKY-SHAPIRO, G.; SMYTH, P. From data mining to knowledge disco- very in databases. AI Magazine, v. 17, p. 37–54, 1996.

FERN ´ANDEZ, A. et al. A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets and Systems, Elsevier North- Holland, Inc., Amsterdam, The Netherlands, v. 159, n. 18, p. 2378–2398, set. 2008.

GRZYMALA-BUSSE, J. W.; STEFANOWSKI, J.; WILK, S. A comparison of two approaches to data mining from imbalanced data. Journal of Intelligent Manufacturing, Springer, v. 16, n. 6, p. 565–573, 2005.

HALL, M. et al. The weka data mining software: An update. SIGKDD Explorations Newslet- ter, ACM, New York, NY, USA, v. 11, n. 1, p. 10–18, nov. 2009.

HAN, H.; WANG, W.; MAO, B. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Proceedings of the International Conference on Intelligent Compu- ting. : Springer-Verlag, 2005. (Lecture Notes on Computer Science, v. 3644), p. 878–887.

HILAS, C. S.; MASTOROCOSTAS, P. A. An application of supervised and unsupervised lear- ning approaches to telecommunications fraud detection. Knowledge-Based Systems, Elsevier, v. 21, n. 7, p. 721–726, 2008.

HU, S. et al. Msmote: Improving classification performance when training data is imbalanced. In: Proceedings of the Second International Workshop on Computer Science and Engine- ering. 2009. v. 2, p. 13–17.

JIANG, L. et al. Survey of improving naive bayes for classification. In: Proceedings of the 3rd International Conference on Advanced Data Mining and Applications. Berlin, Heidelberg: Springer-Verlag, 2007. p. 134–145.

JIANG, L. et al. Evolutional naive bayes. Proceedings of the International Symposium on Intelligent Computation and its Application, p. 344–350, 2005.

J ¨ARVELIN, K.; KEK ¨AL ¨AINEN, J. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, ACM, New York, NY, USA, v. 20, n. 4, p. 422–446, 2002.

KEOGH, E. J.; PAZZANI, M. J. Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, p. 225–230, 1999.

KOHAVI, R.; PROVOST, F. Glossary of terms. Machine Learning, v. 30, n. 2-3, p. 271–274, 1998.

KUBAT, M.; HOLTE, R. C.; MATWIN, S. Machine learning for the detection of oil spills in satellite radar images. Machine learning, Springer, v. 30, n. 2-3, p. 195–215, 1998.

LAURIKKALA, J. Improving Identification of Difficult Small Classes by Balancing Class Distribution. Tech. Report A-2001-2. 2001.

L ´OPEZ, V. et al. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, v. 250, p. 113 – 141, 2013.

L ´OPEZ, V. et al. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classi- fication. open problems on intrinsic data characteristics. Expert Systems with Applications, v. 39, n. 7, p. 6585 – 6608, 2012.

L ´OPEZ, V. et al. Addressing imbalanced classification with instance generation techniques: Ipade-id. Neurocomputing, v. 126, n. 0, p. 15 – 28, 2014.

LUENGO, J. et al. Addressing data complexity for imbalanced data sets: analysis of smote- based oversampling and evolutionary undersampling. Soft Computing, Springer-Verlag, v. 15, n. 10, p. 1909–1936, 2011.

MAHESHWARI, S.; AGRAWAL, J.; SHARMA, S. New approach for classification of highly imbalanced datasets using evolutionary algorithms. International Journal of Scientific & En- gineering Research, Citeseer, v. 2, n. 7, p. 1–5, 2011.

MONARD, M. C.; BARANAUSKAS, J. A. Conceitos Sobre Aprendizado de M´aquina. In: Solange O. Rezende. (Org.). Sistemas Inteligentes Fundamentos e Aplicac¸˜oes. 1. ed. Barueri - SP: Manole Ltda, 2003. 89-114 p.

NAPIERALA, K.; STEFANOWSKI, J.; WILK, S. Learning from imbalanced data in presence of noisy and borderline examples. In: SPRINGER. Proceedings of the 7th International Con- ference on Rough Sets and Current Trends in Computing. 2010. p. 158–167.

PENA, S. D. Thomas Bayes, o “Cara”! Rio de Janeiro: Ciˆencia Hoje, 2006. 22–29 p.

RAMENTOL, E. et al. Smote-rsb*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Kno- wledge and Information Systems, v. 33, n. 2, p. 245–265, 2012.

SAHAMI, M. Learning limited dependence bayesian classifiers. In: KDD. 1996. v. 96, p. 335– 338.

SALAMA, G.; ABDELHALIM, M.; ZEID, M. Experimental comparison of classifiers for bre- ast cancer diagnosis. In: Proceedings of the Seventh International Conference on Computer Engineering Systems. 2012. p. 180–185.

SEBASTIANI, P.; ABAD, M.; RAMONI, M. Bayesian networks. In: MAIMON, O.; RO- KACH, L. (Ed.). Data Mining and Knowledge Discovery Handbook. : Springer US, 2010. p. 175–208.

STEFANOWSKI, J.; WILK, S. Selective pre-processing of imbalanced data for improving clas- sification performance. In: Proceedings of the 10th International Conference in Data Wa- rehousing and Knowledge Discovery. : Springer, 2008. (Lecture Notes on Computer Science, v. 5182), p. 283–292.

WEISS, G. M.; PROVOST, F. Learning when training data are costly: The effect of class distri- bution on tree induction. Journal of Artificial Intelligence Research, AI Access Foundation, USA, v. 19, n. 1, p. 315–354, out. 2003.

WILSON, D.; MARTINEZ, T. Reduction techniques for instance-based learning algorithms. Machine Learning, Kluwer Academic Publishers, v. 38, n. 3, p. 257–286, 2000.

WILSON, D. L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Tran- sactions on Systems, Man and Cybernetics, IEEE, n. 3, p. 408–421, 1972.

WITTEN, I. H.; FRANK, E.; HALL, M. A. Data Mining: Practical Machine Learning Tools and Techniques. 3rd. ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.

WOLPERT, D.; MACREADY, W. No free lunch theorems for optimization. IEEE Transacti- ons on Evolutionary Computation, v. 1, n. 1, p. 67–82, Apr 1997.

ZHENG, Z.; WEBB, G. Lazy learning of bayesian rules. Machine Learning, Kluwer Acade- mic Publishers, v. 41, n. 1, p. 53–84, 2000.

ZHENG, Z.; WU, X.; SRIHARI, R. Feature selection for text categorization on imbalanced data. ACM Sigkdd Explorations Newsletter, ACM, v. 6, n. 1, p. 80–89, 2004.

Documentos relacionados