• Nenhum resultado encontrado

Social media index as proxy of the Brazilian consumer confidence index

N/A
N/A
Protected

Academic year: 2021

Share "Social media index as proxy of the Brazilian consumer confidence index"

Copied!
8
0
0

Texto

(1)

Renato M. S. Farias,1, ∗ Aloísio Campelo Jr.,1, † and Viviane S. Bittencourt1, ‡ 1Instituto Brasileiro de Economia, Fundação Getúlio Vargas

Rio de Janeiro, Rio de Janeiro 22231-000, Brazil

In this paper we formalize a Social Media Index (SMI) based on sentiment analysis performed on Twitter messages publicly available in the Portuguese language. This (possibly high-frequency) index shows the ability to detect events that are relevant to the social discord. Changes in the SMI are compared with changes with the monthly-aggregated Brazilian Consumer Condence Index (CCI) in a 84-month window. The correlation r between the two indexes shows great dependence on the choice of lexicon used to evaluate sentiment (0.59 < r < 0.75). Granger-causality tests show that it is more likely that changes in social media sentiment precedes changes in Consumer Condence than vice-versa. Furthermore, post-selection of Twitter messages based on selected keywords has the power to improve the association between SMI and CCI up to r = 0.81. On the one hand, our ndings suggest that the SMI and CCI are inuenced by the same set of emotions. On the other hand, the SMI has the ability to be published beforehand and with rather higher frequency. Keywords: Consumer condence, Social media, Twitter

I. INTRODUCTION

An expressive volume of information is shared every day in social media websites by people and companies all around the world, both publicly and privately. Several research groups have been studying the possibility of us-ing samples of these messages to generate statistics in all sorts of situations. Among these studies, a few examples are: the forecasting of stock prices in the USA [1], the eects of recessions on public humor [2], the forecasting of Japanese tourist inux to South Korea [3], analysis of labour market conditions in China [4], and the possibility of generating ocial statistics in several countries [59]. These examples show how the adequate treatment of big databases is being explored, and how Big Data is advan-cing the monitoring and forecasting of economic trends.

With regards to trend surveys, recent studies have shown that polarity indicators based on social media mes-sages display high association with Consumer Condence Index (CCI) in two Europeans countries, the Netherlands and the United Kingdom [10]. The process of assigning polarity scores to text messages is called sentiment ana-lysis [1115]. In fact, the authors of Ref. [10] created a Social Media Index (SMI) that quanties the imbal-ance between positive and negative messages shared in Dutch language on the most popular social media plat-forms. The authors show that the monthly-aggregated SMI calculated in a database of Facebook and selected Twitter messages displays high scores of correlation with the Dutch CCI. Also, cointegration of the two series lead the authors to conclude that both indexes are likely to be aected by a fundamental common cause. Therefore, the Social Media Index in The Netherlands can serve as

Also in Instituto de Física, Universidade Federal do Rio de

Janeiro (UFRJ); renato.farias@fgv.br

aloisio.campelo@fgv.comviviane.bittencourt@fgv.com

a (high-frequency) proxy for the Dutch Consumer Con-dence.

In this paper we construct a Brazilian Social Media Index and discuss its relationship with the Brazilian Consumer Condence Index produced by the Fundação Getúlio Vargas (FGV) [16]. We perform sentiment ana-lysis on publicly-available messages posted on Twitter [17]. We then construct a social media index and com-pare its changes, i.e. changes in social media sentiment, to changes in the Brazilian CCI. We show that associ-ation of all Twitter messages and CCI is strong (up to r = 0.75), depending on word classication. Addition-ally, post-selection methods based on keywords can sig-nicantly improve correlation of series up to r = 0.81, leading to strong association of post-selected SMI and Consumer Condence.

The paper is organized as follows: in Sec. II we scribe the methodology of both indexes as well as de-tails regarding out database of Twitter messages. In Sec. III we statistically compare the monthly-aggregated So-cial Media and Consumer Condence Indexes in dierent scenarios. In Sec. IV we conclude the paper.

II. METHODOLOGY

In this Section we describe the methodology of both indexes of interest. In Sec. II A, we provide with a sum-marized version of the methodology of the Brazilian Con-sumer Condence Index developed by the IBRE/FGV (for a complete description, see Ref. [16]). In Sec. II B, we introduce the mathematical denition of the Social Media Index.

A. Consumer condence index

The Consumer Survey is a monthly survey done by the FGV in order to generate indicators that monitor,

(2)

pre-2005-09 2007-01 2008-05 2009-09 2011-01 2012-05 2013-09 2015-01 2016-05 2017-09 2019-01 70 80 90 100 110 CCI 95% confidence

Figure 1. Consumer Condence Index (CCI) with 95% con-dence interval.

view, and predict tendencies in the Brazilian economy. Selected consumers answer to questionnaires that include questions about their perception of a range of dierent topics related to the Brazilian economy. The topics in-clude, for example, the general economic situation of the city they live in, the consumer's household nancial situ-ation, the consumers' perception of the job market in the present and near future, and their expectations for ination and interest rates. Data collection is coordin-ated by the FGV/IBRE's research department, and it is mostly done via telephone and the internet. Each month, a sample of approximately two thousand consumers is selected according to income, age, gender, locality, and instruction.

The Consumer Condence Index is composed of ve questions that are part of the Consumer's Survey, as fol-lows: (i) Current local economic scenario; (ii) Current household nancial situation; (iii) Local economic scen-ario for the next six months; (iv) Household nancial situation for the next six months; (v) Consumer's intent of purchasing durable goods for the next six months. The CCI is mathematically dened as

CCI := 1 5 5 X q=1 X c,r λc,r CCIq,c,r, (1)

where λc,r is the weight associated to the income bracket rand the city c, and

CCIq,c,r := 100 +   nf X i=1 βq,c,r,i− nd X j=1 βq,c,r,j   (2)

is the net result of answers gathered in the q-th question, r-th income bracket and c-th city. For the r-th income bracket, βq,c,r,i (βq,c,r,j) is the weight of the i-th (j-th) informant that opted for a favourable (unfavourable) an-swer in question q and city c. Figure 1 shows the CCI's historical data from its beginning to May/2019.

B. Social media index

In this Subsection we show how we developed the So-cial Media Index, from data collection to the mathemat-ical methodology.

1. Database

The main data source for the SMI is the Twitter web-site. We collect publicly available messages shared on Twitter, the so-called tweets. We chose Twitter as our primary source of data because of two of its features: (i) an impressive volume of tweets is shared publicly each and every day; (ii) data mining rules that are more per-missive than most social media. Nevertheless, Twitter's Application Programming Interfaces (APIs) [18] were not useful to the scope of study. This is due to the fact that the two publicly available APIs only allow Twitter searches in real time samples of public tweets (Stream-ing API) or searches against a sampl(Stream-ing of recent Tweets published in the past 7 days (Search API). Therefore, it is impractical to collect historical data via public Twitter APIs in order to compare with the CCI's temporal series. We created in-house solutions to circumvent the afore-mentioned technical issues. To this extent, we developed data scraping routines in Python [19]. Our algorithm automatically searches for publicly available tweets pos-ted at given day, with no restrictions on past dates. The search is performed such that it displays only tweets that were written in the Portuguese language. As proof of principle, we collected a sample of 7, 641, 237 tweets, evenly distributed in a timespan from January 1, 2010 up to December 31, 2016. From each tweet we extract 11public (meta-)data, such as

1. username: user's handle; it can be changed by the user at any time.

2. fullname: social media name provided by the user; commonly changed to one of the latest trending memes [20].

3. user id: number that uniquely denes the user; provided by Twitter, it cannot be changed by the user.

4. tweet id: number that uniquely denes a given Twitter post; provided by Twitter, it cannot be changed by the user.

5. tweet url: hyperlink to the tweet.

6. timestamp epoch: tweet's time and date in datetime format.

7. timestamp: tweet's time and date in readable format.

(3)

9. replies: number of times the tweet was replied. 10. retweets: number of times the tweet was shared. 11. likes: number of times the tweet was liked. Currently, this database is stored locally in a relational database created using SQLite [21].

2. Sentiment analysis

Automated sentiment analysis is performed for every tweet by another algorithm developed by the authors. We used two Portuguese lexicons known in the literat-ure: OpLexicon [12] and SentiLex [15]. In the context of sentiment analysis and in the context of this paper, a lexicon L is dened as an ensemble of words and their associated polarity scores, i.e.

L := { ω,pol (ω) }ω ∈ S , (3) where S is a set of words, ω, and polarity scores, pol. For instance, S can be a set of words that were classied by a group of annotators as expressing positive (pol(ω) = 1), negative (pol(ω) = −1), or neutral (pol(ω) = 0). Clearly, S corresponds to a subset of all words in (at least) one given language. In general, for two lexicons, L1 and L2, we have dim(S1) 6= dim(S2), where dim(S) is the dimen-sion of set S. The dierence in dimendimen-sion comes from the methodology applied in the generation of the set of classied words. For instance, annotators that analyse an internet-based corpus [14] will generate an ensemble that diers from an ensemble generated by annotators that analysed a corpus based on classical literature [22]. Besides that, it is usual that S16= S2, even if the two sets have the same dimension. This is due to the fact that the process of classifying of words via polarity scores has in-trinsic, systematic human biases. A good example is the SentiLex. This lexicon oers two classications, one for a given annotator. This leads to conicting polarity scores for a subset of the words contained in it. Thus, from now on we make a distinction between the two polar-ity scores presents in SentiLex as SentiLex (0) and Sen-tiLex (1). Moreover, it immediately follows from all the aforementioned characteristics that, in general, L16= L2. Based on all previous considerations as well as the work of Ref. [23], we also produced sentiment scores based on two other lexicons that we named Combination (0) and Combination (1). Combination (0) (Combination (1)) is a concatenation of lexicons SentiLex (0) and OpLex-icon (SentiLex (1) and OpLexOpLex-icon), in order. The goal is to verify if there is any statistical advantage in both sentiment-classication accuracy as well as statistical ad-vantage in correlation between Combination-based SMI and CCI.

Now, let Sτ be the set or words ω that appear both in tweet τ and in the set S of a given lexicon, i.e. Sτ ⊆ S. Then, for each tweet, the (automated) Social Media

Index SMIτ is given by

SMIτ := P ω∈Sτ f (ω) pol (ω) P ω∈Sτ f (ω) , (4)

where f (ω) is the frequency in which word ω appears in tweet τ and pol (ω) was dened in Eq. (3). Since pol (ω) ∈ {−1, 0, 1}, we can see that SMIτ ∈ [−1, 1] im-mediately follows from the normalization of Eq. (4). This kind of automated classication generates at least two relevant systematic errors: (i) misclassication of tweets, especially regarding the neutral polarity; (ii) the non-categorization of tweets in more nuanced classes based on mood (or humor) (e.g., see Ref. [11]). In a scenario where the volume of tweets collected is considerately large and since we are only interest in aggregated indexes over cer-tain periods of times, errors in the individual level will generally cancel each other out [10, 11], such as the type-(i)error aforementioned. Type-(ii) errors are beyond the scope of this paper. However, they should be the subject of later studies. The (aggregated) Social Media Index, SMI, is dened as the dierence between the percentage of positive and negative tweets in a period T , i.e.

SMI := 1 NT NT X τ =1 sgn (SMIτ) , (5)

where NT is the number of tweets with sentiment score in the period T and the sign function, sgn, is such that

sgn (x) :=      −1 if x < 0 ; 0 if x = 0 ; 1 if x > 0 . (6)

In the next Section we show qualitatively how the SMI dened in Eq. (5) behaves. as well as its quantitative relationship with the Consumer Condence Index.

III. RESULTS

Here we present a qualitative and a quantitative discus-sion on the behaviour of the Social Media Index dened in Eq. (5).

A. Qualitative analysis

Figure 2 shows daily-, weekly-, and monthly-aggregated SMI calculated using the lexicon Combina-tion (1). This particular choice of lexicon is going to become clear later. By inspection, we can see that SMI's volatility increases as frequency increases. It is also pos-sible to see regular spikes in all three aggregated indexes. The seasonal spikes with larger amplitude are the ones during the December holidays, which distorts the daily SMI to the positive side during Christmas and New Years

(4)

2010-01 2010-05 2010-09 2011-01 2011-05 2011-09 2012-01 2012-05 2012-09 2013-01 2013-05 2013-09 2014-01 2014-05 2014-09 2015-01 2015-05 2015-09 2016-01 2016-05 2016-09 10 0 10 20 30 40 50 SMI (%) 1 2 3 4 56 7 89 10 11 12 1314 15 16 1718 19 2021

Lexicon: Combination (1)

Daily Weekly Monthly

Figure 2. Example of daily-, weekly-, and monthly-aggregated Social Media Index (SMI) between Jan/2010 e Jul/2016 without seasonal adjustment. SMI was calculated using the lexicon Combination (1). Numeral markers highlight events that are outliers in the daily SMI (see Tab. III in App. A).

Eve. These distortions can be detected by observing pos-itive peaks in last week of each month of December in the weekly SMI, as well as positive uctuations for the month of December in the monthly SMI. This shows that the So-cial Media Index is able to capture seasonal trends in all frequencies.

The daily SMI is also ecient in characterizing events that are relevant to the social discourse in a given period of time. Several of these outlier events happened in the period covered by our database. The most interesting ones are enumerated in Fig. 2. Some are positive events, e.g. holidays and commemorative days. However, the majority of outliers are in the negative side of the polarity score. In order to check for trends in the subjects being discussed, we manually inspected each of the 21 events highlighted in Fig. 2. The results are presented in Table III in App. A.

Figure 3(a) shows the monthly-aggregated SMI for all 5 lexicons. In Fig. 3(b) we see the seasonally-adjusted, monthly-aggregated SMI for all 5 lexicons as well. In Fig. 4 it is shown the seasonality patterns for each lexicon.

B. Quantitative analysis

Table I shows Pearson's correlation coecient, r, for monthly SMI and CCI, as well as metadata related to the performance of our sentiment analysis. Correlations were calculated for cases in which both indexes are (not) seasonally adjusted. In order to check for spurious

correl-ations, we also investigate cointegration by means of the augmented Engle-Granger (AEG) test [24]. The test con-sists in discarding the null hypothesis of no-cointegration when p-value < 0.05. In Table I we show that no pair of series fullled this condition when all available tweets were considered. This suggests a weak (or null) asso-ciation between CCI and monthly-aggregated SMI (for more details, see Refs. [9, 10]). We also checked for causal relationships between the two indexes. Granger-causality tests [25] were used to investigate if changes in the Social Media Index preceded changes in Consumer Condence and vice-versa. Interestingly, the null hypo-thesis of no-causality was discarded both ways. On the one hand, Social Media Index is shown to Granger-cause Consumer Condence with p-value < 0.015. On the other hand, Consumer Condence is shown to Granger-cause the SMI with p-value < 0.029. Both tests were performed with both indexes seasonally adjusted. This shows that, in terms of Granger-causality, it is almost 50%more likely that changes in the SMI precedes changes in Consumer Condence than vice-versa.

It is also noticeable that the concatenated lexicons Combination (0) and Combination (1) do provide perfo-mance enhancement with respect to correlation and co-integration of the two series. This corroborates the res-ults of Ref. [26]. Additionally, Table I shows that Com-bination (1) and SentiLex (1) considerably outperform Combination (0) and SentiLex (0), respectively. Fur-thermore, the SMI calculated using Combination (1) was by far the index that presented the best chance of

(5)

coin-2010-012010-102011-072012-042013-012013-102014-072015-042016-012016-10

10

0

10

20

SMI (%)

(a) Without seasonal adjustment

OpLexicon

Combination (1)

Combination (0)

SentiLex (1)

SentiLex (0)

2010-012010-102011-072012-042013-012013-102014-072015-042016-012016-10

(b) With seasonal adjustment

Figure 3. Monthly-aggregated Social Media Index for all 5 lexicons. Interval of 95% condence cannot be seen due to scale. An important feature is the drop o in monthly sentiment during the 2013 protests in Brazil.

Lexicon #of tweets #of tweets (%)

Correlation coecient between monthly SMI

and CCI(†) (∗) OpLexicon 5, 488, 944 71.8 0.48 / 0.59 SentiLex (0) 4, 277, 546 56.0 0.54 / 0.59 SentiLex (1) 4, 277, 546 56.0 0.67 / 0.73 Combination (0) 5, 992, 216 78.4 0.60 / 0.67 Combination (1) 5, 992, 216 78.4 0.69 / 0.75

Table I. Metadata related to sentiment analysis for all ve lexicons as well as correlation between monthly SMI and CCI. Time interval: from Jan/2010 to Dec/2016. (†) Values before (after) forward slash correspond to correlation between both indexes are (not) seasonally adjusted. (∗) p-values show that none of the series co-integrate, regardless of lexicon or seasonal adjustment. 2 4 6 8 10 12 Month 2 1 0 1 2 3 4 SMI (%) OpLexicon Combination (1) Combination (0) SentiLex (1) SentiLex (0) Average Seasonality

Figure 4. Seasonalities in the monthly-aggregated SMI for all 5 lexicons. Strong positive bias in overall sentiment due to holiday season in Dec.

tegration (p-value ∼ 0.11) . Together with the highest correlation, this is the exact reason that led us to use Combination (1) in Fig. 2.

From now own we focus all statistical analysis on the monthly SMI via SentiLex (1). We performed a similar post-selection analysis as the one in Ref. [10]. Our results are presented in Table II. Post-selection of tweets contain-ing specic words can improve the correlation between

SMI and Consumer Condence. This is due to the fact that most Twitter messages does not provide relevant information in terms of sentiment [5]. In other words, most messages can be considered white noise (or point-less babble). In some cases, combinations of tweets con-taining some of the most spoken words of the Portuguese language (e.g., articles, pronouns) not only outperform the combination of all tweets but also cointegrate with CCI. Interestingly enough, post-selecting messages that contain keywords economic-related keywords (e.g. con-sumer, condence, economy) signicantly underperforms the index calculated with all tweets. In fact, correlations are higher when the keywords involve either denite or indenite articles or both.

IV. CONCLUSIONS

The study describes an association between the Social Media Index developed by the authors and the Brazilian Consumer Condence Index developed by Fundação Getúlio Vargas. The index was calculated using ve dif-ferent lexicons in the Portuguese language. Pearson's correlation coecient shows great dependence in choice

(6)

of lexicon. The concatenation of two of the most prom-inent Portuguese lexicons demonstrated better perform-ance than both of them individually. It is also shown that changes in the SMI are more likely to precede changes in Consumer Condence than vice-versa. As in the case of the Dutch Social Media Index, the Brazilian SMI can be published before the Brazilian CCI, provided that their relationship remains stable.

V. ACKNOWLEDGEMENTS

The authors thank Paulo Picchetti for helpful com-ments on the manuscript. RMSF also thanks Marcelo Alves Baratela for insightful discussions about the best practices of database creation and management, as well as software programming. RMSF acknowledge nan-cial support from FGV, as well as the Brazilian agencies CNPq and CAPES.

[1] J. Bollen, H. Mao, and X. Zeng, Twitter mood predicts the stock market, Journal of Computational Science 2, 1 (2011).

[2] T. Lansdall-Welfare, V. Lampos, and N. Cristianini, Ef-fects of the recession on public mood in the UK, in Pro-ceedings of the 21st International Conference on World Wide Web, WWW '12 Companion (ACM, New York, NY, USA, 2012) pp. 12211226.

[3] S. Park, J. Lee, and W. Song, Short-term forecasting of Japanese tourist inow to South Korea using Google trends data, J. Travel Tour. Mark. 34, 357 (2017). [4] J. Bailliu, X. Han, M. Kruger, Y.-H. Liu, and

S. Thanabalasingam, Can media and text analytics provide insights into labour market conditions in China? (BOFIT Discussion Paper, 2018).

[5] P. J. H. Daas, M. Roos, M. van de Ven, and J. Neroni, Twitter as a potential data source for statistics, Discus-sion Paper (Statistics Netherlands, 2012).

[6] P. J. H. Daas, M. J. Puts, B. Buelens, and P. A. M. van den Hurk, Big Data as a source for ocial statistics, J. O. Stat. 31, 249 (2015).

[7] S.-M. Tam and F. Clarke, Big Data, ocial statistics and some initiatives by the Australian Bureau of Statistics, Int. Stat. Rev. 83, 13 (2015).

[8] C. Reimsbach-Kounatze, The proliferation of Big Data and implications for ocial statistics and statistical agen-cies, OECD Digital Economy Papers (2015).

[9] J. van den Brakel, E. Söhler, P. Daas, and B. Buelens, Social media as a data source for ocial statistics: the Dutch Consumer Condence Index, Survey Methodology, Stat. Can. 43 (2017).

[10] P. J. H. Daas and M. J. H. Puts, Social media senti-ment and consumer condence, Statistical Paper Series 5 (European Central Bank (ECB), 2014).

[11] B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, From tweets to polls: Linking text senti-ment to public opinion time series (2010) pp. 122129. [12] M. Souza and R. Vieira, Sentiment analysis on Twitter

data for Portuguese language, in Proceedings of the 10th International Conference on Computational Processing of the Portuguese Language, PROPOR'12 (Springer-Verlag, Berlin, Heidelberg, 2012) pp. 241247.

[13] S. M. W. Moraes, I. H. Manssour, and M. S. Silveira, 7x1-PT: um Corpus extraído do Twitter para análise de sen-timentos em língua portuguesa, in Anais do X Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (SBC, 2015) pp. 2125.

[14] H. Brum and M. G. V. Nunes, Building a senti-ment corpus of tweets in Brazilian portuguese (2017),

arXiv:1712.08917.

[15] M. J. Silva, P. Carvalho, and L. Sarmento, Building a sentiment lexicon for social judgement mining, in Pro-ceedings of the 10th International Conference on Com-putational Processing of the Portuguese Language, PRO-POR'12 (Springer-Verlag, Berlin, Heidelberg, 2012) pp. 218228.

[16] FGV-IBRE, Sondagem do consumidor:

As-pectos conceituais e metodológicos, https:

//portalibre.fgv.br/main.jsp?lumChannelId=

402880811D8E34B9011D93086A466B16 (2018),

Aces-sado em: Jun-2019.

[17] Twitter, https://www.twitter.com/ (2006), Acessado em: Jun-2019.

[18] Twitter for Developers, https://developer.twitter. com/, Acessado em: Jun-2019.

[19] G. van Rossum, Python tutorial, Tech. Rep. CS-R9526 (Centrum voor Wiskunde en Informatica (CWI), Ams-terdam, 1995).

[20] R. Dawkins, The Selsh Gene (Oxford University Press, Oxford, UK, 1976).

[21] R. Hipp et al., SQLite (Version 3.27.2), Computer soft-ware, Available from: https://www.sqlite.org/ (2019), Retrieved: Jun-2019.

[22] E. Loper and S. Bird, NLTK: The Natural Language Toolkit, in Proceedings of the ACL-02 Workshop on Ef-fective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics -Volume 1 , ETMTNLP '02 (Association for Computa-tional Linguistics, Stroudsburg, PA, USA, 2002) pp. 63 70.

[23] M. Machado, T. A. S. Pardo, and E. Ruiz, Creating a Portuguese context sensitive lexicon for sentiment ana-lysis, in Proceedings of the 13th International Confer-ence on Computational Processing of the Portuguese Lan-guage, PROPOR'2018 (2018) pp. 335344.

[24] R. F. Engle and C. W. J. Granger, Co-integration and error correction: Representation, estimation, and testing, Econometrica 55, 251 (1987).

[25] C. W. J. Granger, Investigating causal relations by econo-metric models and cross-spectral methods, Econoecono-metrica 37, 424 (1969).

[26] M. T. Machado, T. A. S. Pardo, and E. E. S. Ruiz, Creating a portuguese context sensitive lexicon for sen-timent analysis, in Computational Processing of the Por-tuguese Language, edited by A. Villavicencio, V. Mor-eira, A. Abad, H. Caseli, P. Gamallo, C. Ramisch, H. Gonçalo Oliveira, and G. H. Paetzold (Springer In-ternational Publishing, 2018) pp. 335344.

(7)

# Keywords #sentimenttweets with %of tweets withsentiment %of database

Correlation coecient between

SMI and CCI(†)

1 No post-selection 5, 992, 216 100 78.4 0.69 / 0.75

2 Consumer and condence 5, 698 0.1 0.07 −0.5 / − 0.57

3 Consumer, condence, consumption,economy e nance 10, 073 0.2 0.1 0.01 / − 0.009

4 Economy and job(s) 36, 287 0.6 0.5 0.32 / 0.37

5 Indenite articles 1, 023, 129 17.1 13.4 0.45 / 0.51

6 Main pronouns 1, 550, 770 25.9 20.3 0.74∗ / 0.77

7 Denite + indenite articles 3, 526, 613 58.9 46.2 0.73 / 0.77

8 Prepositions (that, for) 1, 944, 380 32.4 25.4 0.71 / 0.78

9 Indenite articles and I 1, 939, 005 32.4 25.4 0.74 / 0.78

10 I, me 514, 485 19.1 14.9 0.77 / 0.78

11 Main pronouns + their main onlinevariants 1, 923, 012 32.1 25.2 0.76 / 0.79

12 Denite articles 3, 046, 460 50.8 39.9 0.76 / 0.79

13 prepositions, and main pronounsDenite + indenite articles, 4, 431, 744 74.0 58.0 0.75 / 0.80

14 Denite + indenite articles,prepositions, and I 1, 807, 113 73.8 41.3 0.78 / 0.81

15

Denite + indenite articles, prepositions, and main pronouns +

their main online variants 4, 524, 720 75.5 59.2 0.76 / 0.80

16 Denite + indenite articles and I 3, 970, 064 66.3 52.0 0.77 / 0.81

17 Denite articles and I 3, 602, 836 60.1 47.1 0.79∗ / 0.82

Table II. Eectiveness of post-selecting tweets via keywords and the resulting correlation between monthly Social Media Index and Consumer Condence. All results presented are based on SMI calculated with SentiLex (1). The process of post-selection considers not only the keywords presented in this table but also their plurals ans synonyms, if available. Time interval: from Jan/2010 to Jul/2016. (†) Values to the right (left) of forward slash correspond to correlation when both indexes are (not) seasonally adjusted. (∗) Cointegration (p-value < 0.05).

(8)

Marker Day Event Type of event Daily SMI (%) Polarity

1 April 07, 2011 Realengo Shooting Non-seasonal −25.3 Negative

2 April 18, 2011 (Brazilian) Friendship Day Seasonal 18.8 Positive

3 July 20, 2011 (International) Friendship Day Seasonal 29.9 Positive

4 June 17, 2013 Protests: First day Non-seasonal −22.4 Negative

5 June 18, 2013 Protests: Second day Non-seasonal −16.4 Negative

6 June 20, 2013 Protests: Fourth day Non-seasonal −16.2 Negative

7 June 30, 2013 Protests: Confederation Cup Finals Non-seasonal −17.8 Negative

8 June 16, 2014 World Cup: Ghana 1 × 2 USA Non-seasonal −18.1 Negative

9 June 22, 2014 World Cup: USA 2 × 2 Portugal Non-seasonal −17.2 Negative

10 July 04, 2014 World Cup: Neymar's injury Non-seasonal −40.5 Negative

11 July 07, 2014 World Cup: Brazil 1 × 7 Germany Non-seasonal −26.7 Negative

Imagem

Figure 1. Consumer Condence Index (CCI) with 95% con- con-dence interval.
Figure 2. Example of daily-, weekly-, and monthly-aggregated Social Media Index (SMI) between Jan/2010 e Jul/2016 without seasonal adjustment
Figure 3. Monthly-aggregated Social Media Index for all 5 lexicons. Interval of 95% condence cannot be seen due to scale.
Table II. Eectiveness of post-selecting tweets via keywords and the resulting correlation between monthly Social Media Index and Consumer Condence
+2

Referências

Documentos relacionados

Most of the work done in sentiment analysis is done with specific domains and with data sets through which the results can be verified, such as reviews, consisting of the text that

Partindo para uma análise do ponto de equilíbrio considerando o custo de oportunidade e o custo de capital, teremos a seguinte modificação de valores e análise: custos fixos somados

This paper analyses the associations between Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) on the prevalence of schistosomiasis and the presence of

Pode-se dizer que, a partir da militância como intelectual e educadora católica, Maria Junqueira Schmidt inseriu-se e afirmou-se no campo da produção intelectual do país, tendo

Nesse estudo alertou-se para o aumento de 17% na superfície regada na região do Alentejo, não obstante o decréscimo existente no número de explorações agrícolas

Nas amostras de resíduo analisadas dos três tanques, o valor médio da relação C/N foi 3, demonstrando que este resíduo orgânico possui alta capacidade de

In this work, we propose a new index to characterize the value of importance of species based on the hierarchy and spatial absolute probability of the species in the

Foram estudadas cinco lesões, sendo três localizadas no maléolo, uma em terços da perna com tempo de evolução de mais de 10 anos e as quatro lesões apresentavam