• Nenhum resultado encontrado

R Fmeasure P

7. Conclusions

Sentiment analysis track at ROMIP 2012

Finally, for each query we calculated weights of all documents in the collection in accordance with the following formula:

+

=

S

x x x

x

fp tp

tp P S

Macro _ 1

x x

x

fn tp R tp

= +

+

=

S

x x x

x

fn tp

tp R S

Macro _ 1

R P

R Fmeasure P

+

= 2 ⋅

x x x x

x x

fn fp tn tp

tn Accuracy tp

+ + +

= +

n p D q

n

i i i

=

=

1

)

2

(

=

=

n

i

i rel n

P

1

) (

@

n IDCG

n n DCG

NDCG @

@ = @ ∑

=

+

=

n

i

i

i rel rel

n DCG

2

log

2

( ) ) ) (

1 (

@

t SentiWeigh tfidf

tfidf Weight

q w

header q w

w w

+ + − ⋅

= ∑ ∑

) 1 ( )

( α

α

x x

x

fp tp P tp

= +

We have experimented with different values of α = {0.2, 0.4, 0.5, 0.6, 0.8}. The best result was obtained with α = 0.6. This result shows the importance of sentiment words in the task of query-based sentiment extraction. All the best results in the re- sulting Table 6 were obtained using aforementioned approach.

We tried to evaluate the participant results dealing with unlabeled documents as with irrelevant, but it leaded to serious underestimation of the performance. Thus we decided to use only labeled documents, excluding all other documents from the results preserving the order of the remaining documents. The main measures of the performance in this task were NDCG@10 and P@10.

table 6. Query-based sentiment extraction results

Run_ID Object P@1 P@5 P@10 NDCG@10

xxx-0 book 0.3 0.32 0.286 0.305

xxx-9 book 0.3 0.31 0.323 0.304

xxx-8 book 0.25 0.31 0.332 0.298

xxx-6 book 0.25 0.31 0.327 0.302

yyy-9 camera 0.402 0.313 0.302 0.305

yyy-7 camera 0.427 0.319 0.300 0.303

yyy-1 camera 0.402 0.328 0.325 0.226

yyy-2 camera 0.440 0.325 0.311 0.303

zzz-3 film 0.494 0.449 0.438 0.338

zzz-8 film 0. 494 0.448 0.444 0.332

Chetviorkin I. I., Loukachevitch N. V.

Acknowledgements. We are grateful to Yandex and Anton Pavlov in particular for help with collecting data for research purposes of the seminar. This work is par- tially supported by RFBR grant N11-07-00588-а.

References

1. Ageev M., Dobrov B., Loukachevitch N., Sidorov A. Experimental algorithms vs.

basic line for web ad hoc, legal ad hoc, and legal categorization. In Proceedings of RIRES, 2004, (in Russian)

2. Blinov P., Klekovkina M., Kotelnikov  E, Pestov O. Research of lexical approach and machine learning methods for sentiment analysis. In Proceedings of Dialog, Bekasovo, 2013, (In Russian)

3. Bing L. Sentiment Analysis Tutorial, AAAI, San Francisco, USA, 2011

4. Chetviorkin I., Braslavskiy P., Loukachevitch N. Sentiment Analysis Track at ROMIP 2011 In Proceedings of Dialog, Bekasovo, 2012, pp. 1–14.

5. Chetviorkin I. and Loukachevitch N. Extraction of Russian Sentiment Lexicon for Product Meta-Domain In Proceedings of COLING 2012, Mumbai, India, 2012, pp. 593–610

6. Frolov A., Polyakov P., Pleshko V. Using semantics categories in application to book reviews sentiment analysis. In Proceedings of Dialog, Bekasovo, 2013, (In Russian)

7. Kuznetsova E. S., Loukachevitch N. V., Chetviorkin I. I. Testing rules for sentiment analysis system. In Proceedings of Dialog, Bekasovo, 2013, (In Russian)

8. Mavljutov R., Ostapuk N. Using basic syntactic relations for sentiment analysis.

Computational Linguistics and Intellectual Technologies. In Proceedings of Dia- log, Bekasovo, 2013, (In Russian)

9. Ounis I., de Rijke M., Macdonald C., Mishne G., Soboroff I. Overview of TREC- 2006 Blog track. In Proceedings of TREC-2006, Gaithersburg, USA, 2007.

10. Panicheva P. Atex. A rule-based sentiment analysis system. Processing texts in various topics. Computational Linguistics and Intellectual Technologies.

In Proceedings of Dialog, Bekasovo, 2013, (In Russian)

11. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. Lexicon-based meth- ods for sentiment analysis. Computational Linguistics, 37(2), 2011, pp. 267–307.

ReseaRch of lexical appRoach and machine leaRning methods foR sentiment analysis

Blinov P. D.

(blinoff.pavel@gmail.com),

Klekovkina M. V.

(klekovkina.mv@gmail.com),

Kotelnikov E. V.

(kotelnikov.ev@gmail.com),

Pestov O. A.

(oleg.pestov@gmail.com)

Vyatka State Humanities University, Kirov, Russia

Methods and approaches used by the authors to solve the problem of senti- ment analyses on the seminar ROMIP-2012 are described. The lexical ap- proach is represented with the lexicon-based method which uses emotional dictionaries manually made for each domain with the addition of the words from the training collections.

The machine learning approach is represented with two methods: the maxi- mum entropy method and support vector machine. Text representation for the maximum entropy method includes the information about the proportion of positive and negative words and collocations, the quantity of interroga- tion and exclamation marks, emoticons, obscene language. For the support vector machine binary vectors with cosine normalization are built on texts.

The test results of the described methods are compared with those of the other participants of the ROMIP seminar. The task of classification of re- views for movies, books and cameras is investigated. On the whole. The lex- ical approach demonstrates worse results than machine learning methods, but in some cases excels it. It is impossible to single out the best method of machine learning: on some collections maximum entropy method is pref- erable, on others the support vector machine shows better results.

Key words: sentiment analysis, lexical approach, machine learning, maxi- mum entropy method, support vector machine, ROMIP

1. Introduction

Text sentiment analysis is an extensively researched area of computational lin- guistics in last ten years. The main problem of sentiment analysis is an identification of emotional attitude to some object in a text.

Obviously there are many practical applications for sentiment analysis. For ex- ample, opinion analysis of target audience helps to reveal strengths and weaknesses of a commercial product. Automatic rating of movie or book reviews enables to make support recommendations for choice of work. Sentiment analysis systems are also used in sociological and political researches, in human-computer interfaces and in other spheres [12, 15].

Blinov P. D., Klekovkina M. V., Kotelnikov E. V., Pestov O. A.

A majority of researches in sentiment analysis are made for English texts. For a variety of reasons such studies on Russian text collections were not as popular. Re- cently, however, the situation began to change for the better: for two years a semi- nar ROMIP [24] has proposed the sentiment analysis tracks including classification of user reviews into 2, 3, and 5 classes. At a seminar in 2012, two new tasks appeared:

the classification of news fragments into 3 classes and opinions search on requests.

The purpose of this paper is to present the results of the participation of the team of authors at sentiment analysis tracks at ROMIP-2012. Two approaches were investi- gated: lexical approach and machine learning approach.

The reminder of this paper is structured as follows. Section 2 gives an overview of current approaches to the problem of sentiment analysis. In section 3 the method of the lexical approach is considered. Section 4 is devoted to the machine learning methods. Section 5 presents the results of experiments at the ROMIP-2012 and their analysis. We provide concluding remarks and findings in Section 6.