• Nenhum resultado encontrado

Como trabalhos futuros pretendemos desenvolver algumas extensões com o objetivo de melhorar a pesquisa relatada nesta dissertação. A seguir, apresentaremos as propostas para extensão:

• Extração de aspectos: esta importante etapa no processo de Análise de Sentimento ficou ausente no ASDP. Contudo, o próximo passo é o desenvolvimento de um método para a extração automática de atributos e/ou tópicos relacionados aos debates não ideológicos, com o objetivo de transformar o ASDP em um processo completo de AS.

• Resolução de Anáforas: apesar de estar implementado no ASDP, não foi feita uma avaliação profunda sobre o algoritmo simplista de resolução de anáfora. Durante uma avaliação superficial, foram identificados algumas resoluções erradas, e que acrescentaram ruídos aos resultados.

6.2. TRABALHOS FUTUROS

• Utilização de um corretor ortográfico: a linguagem predominante nestes debates onlineé uma linguagem informal, com a grande ocorrência de erros ortográficos, visto que os usuários não se preocupam em escrever de forma correta. O uso de um corretor ortográfico auxiliaria o pré-processamento do texto podendo melhorar a precisão do ASDP.

• Análise dos posts no grafo: o quinto submódulo da etapa de classificação de postura do ASDP apresentado no capítulo 4, cria um grafo a partir dos replies identificados nos debates do site convinceme1. Porém outros sites de debates possuem mecanismos próprios para os usuários apresentarem críticas direcionadas a outros comentários. Dessa forma, pretendemos criar um novo método para a criação do grafo dos posts, e juntamente com esta nova estrutura, um mecanismo para auxiliar no processo de classificação de sentimento através dos grafos criados.

Abu-Jbara, A., Diab, M., Dasigi, P. e Radev, D. Subgroup Detection in Ideolo- gical Discussions. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 399–409. URL http://dl.acm.org/citation.cfm?id=2390524.2390580, 2012. Allen, J. Natural Language Understanding (2Nd Ed.). Benjamin-Cummings Publishing

Co., Inc., Redwood City, CA, USA, 1995.

Baccianella, S., Esuli, A. e Sebastiani, F. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. N. C. C. Chair), K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner e D. Tapias (editores), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta, 2010.

Baeza-Yates, R. A. e Ribeiro-Neto, B. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

Bansal, M., Cardie, C. e Lee, L. The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework. D. Scott e H. Uszkoreit (editores), COLING (Posters), páginas 15–18. URLhttp://dblp.uni-trier. de/db/conf/coling/coling2008p.html#BansalCL08, 2008.

Carter, R. e McCarthy, M. Cambridge Grammar of English: A Comprehensive Guide : Spoken and Written English Grammar and Usage. Cambridge University Press, 2006. Castro, A. Detecção Automática de Subjetividade: Aprendizagem de Máquina Versus

Ferramentas Linguísticas. Trabalho de Graduação em Ciência da Computação - Centro de Informática/UFPE, Recife, 2011.

Cooley, R., Mobasher, B. e Srivastava, J. Web Mining: Information and Pattern Discovery on the World Wide Web. ICTAI ’97: Proceedings of the 9th International Conference on Tools with Artificial Intelligence, ICTAI ’97. IEEE Computer Society, Washington, DC, USA, página 558, 1997.

Dave, K., Lawrence, S. e Pennock, D. M. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. Proceedings of the 12th international

REFERÊNCIAS

conference on World Wide Web, WWW ’03. ACM, New York, NY, USA, páginas 519–528. URLhttp://doi.acm.org/10.1145/775152.775226, 2003. Esuli, A. e Sebastiani, F. SENTIWORDNET: A Publicly Available Lexical Resource for

Opinion Mining. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06, páginas 417–422, 2006.

Feldman, R. e Sanger, J. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York, NY, USA, 2006.

Gonçalves, J. E. L. As empresas são grandes coleções de processos. Revista de Adminis- tração de Empresas, 40:6 – 9. URLhttp://www.scielo.br/scielo.php? script=sci_arttext&pid=S0034-75902000000100002&nrm=iso, 2000.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. e Witten, I. H. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl., 11(1):10–18. URLhttp://doi.acm.org/10.1145/1656274.1656278, 2009.

Hatzivassiloglou, V. e Wiebe, J. M. Effects of Adjective Orientation and Gradability on Sentence Subjectivity. Proceedings of the 18th Conference on Computational Linguistics - Volume 1, COLING ’00. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 299–305. URLhttp://dx.doi.org/10.3115/ 990820.990864, 2000.

Hotho, A., Nürnberger, A. e Paaß, G. A Brief Survey of Text Mining. LDV Fo- rum - GLDV Journal for Computational Linguistics and Language Technology, 20(1):19–62. URL http://www.kde.cs.uni-kassel.de/hotho/pub/ 2005/hotho05TextMining.pdf, 2005.

Hu, M. e Liu, B. Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04. ACM, New York, NY, USA, páginas 168–177. URL http: //doi.acm.org/10.1145/1014052.1014073, 2004.

Jakob, N. e Gurevych, I. Using Anaphora Resolution to Improve Opinion Target Identi- fication in Movie Reviews. Proceedings of the ACL 2010 Conference Short Papers, ACLShort ’10. Association for Computational Linguistics, Stroudsburg, PA, USA, pá- ginas 263–268. URLhttp://dl.acm.org/citation.cfm?id=1858842. 1858891, 2010.

Jindal, N. e Liu, B. Review spam detection. Proceedings of the 16th international conference on World Wide Web, WWW ’07. ACM, New York, NY, USA, páginas 1189– 1190. URLhttp://doi.acm.org/10.1145/1242572.1242759, 2007. Kabadjov, M. A. A Comprehensive Evaluation of Anaphora Resolution and Discourse-

new Classification. Department of Computer Science - University of Essex, United Kingdom, 2007.

Kim, S.-M. e Hovy, E. Determining the Sentiment of Opinions. Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04. Association for Computational Linguistics, Stroudsburg, PA, USA. URLhttp://dx.doi.org/ 10.3115/1220355.1220555, 2004.

Kosala, R. e Blockeel, H. Web Mining Research: A Survey. SIGKDD Explor. Newsl., 2(1):1–15. URLhttp://doi.acm.org/10.1145/360402.360406, 2000. Kruijff-Korbayova, I. e Webber, B. Interpreting concession statements in light of infor-

mation structure, volume 3. Kluwer Academic Publishers, 2007.

Lima, D. PairExtractor: Extração de Pares Livre de Domínio para Análise de Sentimentos. Trabalho de Graduação em Ciência da Computação - Centro de Informática/UFPE, Recife, 2011.

Liu, B. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data- Centric Systems and Applications). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

Liu, B. Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, 2nd ed, 2010.

Liu, B. Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 1 edição, 2012.

Malouf, R. e Mullen, T. Taking sides: User classification for informal online political discourse. Internet Research, 2008.

Manning, C. D., Raghavan, P. e Schütze, H. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

Manning, C. D. e Schütze, H. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, USA, 1999.

REFERÊNCIAS

Mitchell, T. M. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edição, 1997.

Mitkov, R. Relatório Técnico Anaphora Resolution: The State Of The Art, School of Languages and European Studies, University of Wolverhampton, 1999.

Morinaga, S., Yamanishi, K., Tateishi, K. e Fukushima, T. Mining Product Reputations on the Web. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02. ACM, New York, NY, USA, páginas 341–349. URLhttp://doi.acm.org/10.1145/775047.775098, 2002.

Mullen, T. e Malouf, R. A preliminary investigation into sentiment analysis of informal political discourse. AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW, páginas 159–162, 2006.

Nadeau, D. e Sekine, S. A survey of named entity recognition and classification. Lingvis- ticae Investigationes, 30(1):3–26. URLhttp://www.ingentaconnect.com/ content/jbp/li/2007/00000030/00000001/art00002, 2007.

Pang, B. e Lee, L. Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr., 2(1-2):1–135. URLhttp://dx.doi.org/10.1561/1500000011, 2008. Pang, B., Lee, L. e Vaithyanathan, S. Thumbs up?: sentiment classification using

machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, EMNLP ’02. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 79–86. URL http: //dx.doi.org/10.3115/1118693.1118704, 2002.

Popescu, A.-M. e Etzioni, O. Extracting product features and opinions from reviews. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 339–346. URLhttp://dx.doi.org/10.3115/ 1220575.1220618, 2005.

Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, L., A.and Robaldo e Webber, B. Relatório Técnico PDTB 2.0 Annotation Manual, Institute for Research in Cognitive Science, University of Pennsylvania, http://www.seas.upenn.edu/ pdtb/PDTBAPI/pdtb- annotation-manual.pdf, 2008.

Qiu, G., Liu, B., Bu, J. e Chen, C. Opinion word expansion and target extraction through double propagation. Comput. Linguist., 37(1):9–27. URLhttp://dx.doi.org/ 10.1162/coli_a_00034, 2011.

Riloff, E. e Wiebe, J. Learning extraction patterns for subjective expressions. Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 105– 112. URLhttp://dx.doi.org/10.3115/1119355.1119369, 2003. Robaldo, L., Miltsakaki, E. e Bianchini, A. Corpus-based Semantics of Concession:

Where do Expectations Come from? N. C. C. Chair), K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner e D. Tapias (editores), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta, 2010.

Russell, S. J. e Norvig, P. Artificial Intelligence: A Modern Approach. Pearson Education, 2 edição, 2003.

Sebastiani, F. Machine Learning in Automated Text Categorization. ACM Comput. Surv., 34(1):1–47. URLhttp://doi.acm.org/10.1145/505282.505283, 2002. Silva, E. Um sistema para extração de informação em referências bibliográficas baseado em aprendizagem de máquina. Dissertação de Mestrado - Centro de Informática/UFPE, Recife, 2004.

Silva, G., N. PairClassif - Um Método para Classificação de Sentimentos Baseado em Pares. Dissertação de Mestrado - Centro de Informática/UFPE, Recife, 2013.

Silva, N., Lima, D. e Barros, F. SAPair: Um processo de Análise de Sentimento no Nível de Característica. IV International Workshop on Web and Text Intelligence (WTI – 2012). Proc of the Brazilian Conference on Iteligent Systems, Curitiba, PR, Brasil,

páginas 1–10, 2012.

Siqueira, H. WhatMatter: Extração e visualização de características em opiniões sobre serviços. Dissertação de Mestrado - Centro de Informática/UFPE, Recife, 2010. Somasundaran, S. e Wiebe, J. Recognizing Stances in Online Debates. Proceedings of the

Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, ACL

REFERÊNCIAS

’09. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 226– 234. URLhttp://dl.acm.org/citation.cfm?id=1687878.1687912, 2009.

Somasundaran, S. e Wiebe, J. Recognizing Stances in Ideological On-line Deba- tes. Proceedings of the NAACL HLT 2010 Workshop on Computational Approa- ches to Analysis and Generation of Emotion in Text, CAAGET ’10. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 116–124. URL http://dl.acm.org/citation.cfm?id=1860631.1860645, 2010. Sommerville, I. Software Engineering (5th Ed.). Addison Wesley Longman Publishing

Co., Inc., Redwood City, CA, USA, 1995.

Steinbach, M., Karypis, G. e Kumar, V. A Comparison of Document Clustering Te- chniques. M. Grobelnik, D. Mladenic e N. Milic-Frayling (editores), KDD-2000 Workshop on Text Mining, August 20. Boston, MA, páginas 109–111. URLhttp: //www-users.cs.umn.edu/~karypis/publications/ir.html, 2000. Taboada, M., Brooke, J., Tofiloski, M., Voll, K. e Stede, M. Lexicon-based Methods for Sentiment Analysis. Comput. Linguist., 37(2):267–307. URLhttp://dx.doi. org/10.1162/COLI_a_00049, 2011.

Thomas, M., Pang, B. e Lee, L. Get out the Vote: Determining Support or Opposition from Congressional Floor-debate Transcripts. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 327–335. URLhttp: //dl.acm.org/citation.cfm?id=1610075.1610122, 2006.

Toutanova, K., Klein, D., Manning, C. D. e Singer, Y. Feature-rich Part-of-speech Tagging with a Cyclic Dependency Network. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL ’03. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 173–180. URLhttp://dx.doi.org/ 10.3115/1073445.1073478, 2003.

Tsytsarau, M. e Palpanas, T. Survey on Mining Subjective Data on the Web. Data Min. Knowl. Discov., 24(3):478–514. URL http://dx.doi.org/10.1007/ s10618-011-0238-6, 2012.

Turney, P. D. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 417–424. URLhttp://dx.doi.org/10.3115/ 1073083.1073153, 2002.

Turney, P. D. e Littman, M. L. Measuring Praise and Criticism: Inference of Semantic Orientation from Association. ACM Trans. Inf. Syst., 21(4):315–346. URLhttp: //doi.acm.org/10.1145/944012.944013, 2003.

Walker, M. A., Anand, P., Abbott, R., Tree, J. E. F., Martell, C. e King, J. That is Your Evi- dence?: Classifying Stance in Online Political Debate. Decis. Support Syst., 53(4):719– 729. URLhttp://dx.doi.org/10.1016/j.dss.2012.05.032, 2012. Wang, H., Can, D., Kazemzadeh, A., Bar, F. e Narayanan, S. A System for Real-time

Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Proceedings of the ACL 2012 System Demonstrations, ACL ’12. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 115–120. URLhttp://dl.acm.org/ citation.cfm?id=2390470.2390490, 2012.

Wang, Y.-C. e Rosé, C. P. Making Conversational Structure Explicit: Identification of Initiation-response Pairs Within Online Discussions. Human Language Techno- logies: The 2010 Annual Conference of the North American Chapter of the Associ- ation for Computational Linguistics, HLT ’10. Association for Computational Lin- guistics, Stroudsburg, PA, USA, páginas 673–676. URLhttp://dl.acm.org/ citation.cfm?id=1857999.1858096, 2010.

Wiebe, J. M. Tracking Point of View in Narrative. Comput. Linguist., 20(2):233–287. URLhttp://dl.acm.org/citation.cfm?id=972525.972529, 1994. Wilson, T., Wiebe, J. e Hoffmann, P. Recognizing Contextual Polarity in Phrase-level

Sentiment Analysis. Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05. Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 347–354. URLhttp: //dx.doi.org/10.3115/1220575.1220619, 2005.

Yu, H. e Hatzivassiloglou, V. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03.

REFERÊNCIAS

Association for Computational Linguistics, Stroudsburg, PA, USA, páginas 129–136. URLhttp://dx.doi.org/10.3115/1119355.1119372, 2003.

A

Bases de Substituições

A.1

Contrações

Contração Palavra Real i’m i am i’ll i will i’d i had i’ve i have you’re you are you’ll you will you’d you had you’ve you have

he’s he is he’ll he will he’d he had she’s she is she’ll she will she’d she had it’s it is it’d it had we’re we are we’ll we will we’d we had we’ve we have they’re they are they’ll they will they’d they had they’ve they have there’s there is there’ll there will there’d there had that’s that is that’ll that will aren’t are not

can’t can not Continua na próxima página

Contração Palavra Real couldn’t could not

didn’t did not doesn’t does not don’t do not hadn’t had not hasn’t has not isn’t is not mustn’t must not needn’t need not shouldn’t should not

wasn’t was not weren’t were not won’t will not wouldn’t would not

aren ’ t are not can ’ t can not couldn ’ t could not

didn ’ t did not doesn ’ t does not don ’ t do not hadn ’ t had not hasn ’ t has not isn ’ t is not mustn ’ t must not needn ’ t need not shouldn ’ t should not

wasn ’ t was not weren ’ t were not won ’ t will not wouldn ’ t would not

they ’ re they are haven ’t have not don ’ t do not

i ’ ll i will i ’ m i am i ’ d i had i ’ ve i have you ’ re you are

you ’ ll you will you ’ d you had you ’ ve you have

he ’ s he is he ’ ll he will he ’ d he had she ’ s she is she ’ ll she will she ’ d she had it ’ s it is it ’ d it had we ’ re we are Continua na próxima página

A.2. ABREVIÇÕES

Contração Palavra Real we ’ ll we will we ’ d we had we ’ ve we have they ’ re they are they ’ ll they will they ’ d they had they ’ ve they have

there ’ s there is there ’ ll there will there ’ d there had that ’ s that is that ’ ll that will aren ’ t are not

can ’ t can not couldn ’ t could not

didn ’ t did not doesn ’ t does not don ’ t do not hadn ’ t had not hasn ’ t has not isn ’ t is not mustn ’ t must not needn ’ t need not shouldn ’ t should not

wasn ’ t was not weren ’ t were not won ’ t will not wouldn ’ t would not

A.2

Abrevições

Abreviação Palavra Real a.c. before christ a.m. ante meridiem abbrev. abbreviation

ac. before christ acc. account acct. account adj. adjective adm. administrator

adv. adverb

am. ante meridiem app application app. application program applic. application

apps application assoc. association

aug. august

Abreviação Palavra Real bbm blackberry messenger bhd. bulkhead brit. british bros. brother

cap. civil air patrol capt. captain

ceo. chief executive officer cia. central intelligence agency comm. commercial communic. communications compan. companion conf. conference const. construction contr. contrast contrib. contribution corp. corporation corresp. corresponding d.c. after christ dc. after christ dec. december def. definition dept. department descr. description dict. dictionary dr. doctor drs. doctresse ecol. ecology econ. economy ed. edition educ. education electr. electricity electron. eletronic elem. element encycl. encyclopaedia eng. english etc. et cetera exc. except exerc. exercise faq. frequently asked question feb. february fem. feminine fig. figurative fr. french freq. frequent fund. fundamental geogr. geography geol. geology geom. geometry gov. government govt. government

A.2. ABREVIÇÕES

Abreviação Palavra Real hist. history i.e. that is ie. that is illustr. illustration inc. incorporated ind. industry industr. industry infl. influenced inst. institute introd. introduction irreg. irregular jr. junior lang. language lt. local time ltd. limited company man. manual managem. management masc. masculine math. mathematics med. medicine mem. memory mr. mister mrs. mistress ms. miss nat. natural no. number nov. november nucl. nuclear

o.e. old english obj. object oct. october oe. old english

opt. optics org. organization orig. origin outl. outline oxf. oxford p.m. post meridiem pers. person photogr. photography phr. phrase phys. physical pict. picture pl. plural plur. plural pm. post meridiem pol. politics pop. popular pract. practice prec. preceding

Abreviação Palavra Real prep. preposition pres. present princ. principle priv. privative prob. probably probl. problem proc. proceeding prof. professor pron. pronoun prop. properly ptt push to talk publ. publications quot. quotation rec. record ref. reference reg. register rel. related rep. report repr. representative res. research rev. review sel. selection sens. sense sept. september soc. society sociol. sociology spec. specification st. saint struct. structure subj. subject subord. subordinate subseq. subsequently subst. substantively superl. superlative surg. surgery syst. system techn. technology technol. technological teleph. telephony trans. transactions transf. transferred transl. translation trav. travel treas. treasury treatm. treatment

u.s. united states

u.s.s.r. union of soviet socialist republics us. united states

ussr. union of soviet socialist republics vertebr. vertebrate

A.2. ABREVIÇÕES

Abreviação Palavra Real vet. veterinary vs. versus vulg. vulgar wd. word westm. westminster wks. works yrs. years zoogeogr. zoogeography zool. zoology yr year cuz because open-source opensource open - source opensource next - gen next-gen next - gen next-gen atleast at least

dosnt does not

cuz because

cant can not

dont do not

ive i have

crash bandicoot crashbandicoot cannot can not pop ups popups becuz because

2b to be

2bh to be honest

2day today

2ez too easy

2g too good

2g2bt too good to be true 2l8 too late 2m tomorrow 2moro tomorrow 2moz tomorrow 2mrw tomorrow 2u to you

2u2 to you too 2-you to you

a/w anyway

afaik as far as i know allshare all share

alltel all phone

alot a lot

asap as soon as possible atb all the best batt battery biz business brb be right back

bt bluetooth

Abreviação Palavra Real btfl beautiful btw by the way couldnt could not

cya see you

dat that

dats that is didn did not didnt did not

dis this

doesnot does not doesnt does not

dont do not

dunno do not know dupe duplicate

ftw for the win fyi for your information gfu good for you gimme give me

gj good job

gl good luck gratz congratulations havent have not

idk i do not know idunno i do not know

im i am

imo in my opinion iow in other words isnt is not its it is its it is lmk let me know mic microphone micro microusb micro microsd moto motorola nite night np no problem omg oh my god otoh on the other hand

ppl people tbh to be honest thats that is thx thanks txt text ty thank you