4.2 Methodology
4.2.3 News Coverage and Tail Risk
A tail risk measure builds on the idea that prices reflect investors’ concerns about future states of the economy. However, there is a gap between measuring tail risk and identifying which concerns are associated to tail risk and how. In order to investigate the origins of tail risk fluctuations, I follow Manela and Moreira (2017) in considering the
10The expected shortfall is a coherent measure of risk and contains information about the whole tail of the distribution instead of just a point-wise percentile as VaR.
11They also rely on the theoretical results of Kitamura, Otsu and Evdokmov (2013), who show that this choice ofγ is the most robust based on an asymptotic perturbation criterion.
time variation in the choice of words by the business press as a proxy for the evolution of the concerns of the average investor. This assumption is consistent with the model of media bias in Gentzkow and Shapiro (2006) and with the idea that news media reflect the interests of readers (Tetlock, 2007; Manela, 2011). Therefore, I fit a model that relates news coverage with tail risk in order to identify the investors’ concerns representing sources of tail risk variation.
I begin by constructing a news dataset consisting of the abstracts of daily front-page articles in the online version of the largest financial newspaper in Brazil, Valor Econˆomico.12 This is accomplished by performing web scraping, a technique employed to extract large amounts of data from websites. I download the HTML codes for each daily news coverage and extract the news abstracts from August 2011 to July 2020.13 The text needs to be treated in order to be used in text-based analysis. First, I eliminate digits, special characters and punctuation. Second, I remove stopwords (highly frequent words) and words that appear less than five times in the whole sample. The remaining text is separately broken into one- and two-word n-grams.14 Third, I employ Part-of-Speech tagging15 to classify the part of speech of each word using the nlpnet library in Python, which performs natural language processing tasks based on neural networks and provides a Part-of-Speech tagger for the Portuguese language.16
Having tagged the words, only nouns are left in the sample, totaling 21,460 n-grams in the dataset. In order to get a relatively large body of text for each observation and to match the frequency of the tail risk measure, the number of times each n-gram appears in each day is counted and the counts are aggregated to the monthly frequency.
The number of words per day and per article varies, so I normalize n-gram counts by the total number of n-grams that appear each month, generating a J = 21,460 vector Xt= [Xt,1, ..., Xt,J]0 of n-gram frequencies for each month:
Xt,j = appearances of n-gramj in montht
total n-grams in month t . (4.9)
The relation between news coverage and tail risk is derived from the co-movement between n-gram frequencies and the estimated tail risk measure. That is, n-gram
frequen-12The online version of the newspaper is available at https://valor.globo.com/impresso.
For a previous date, as for example, March 2 2020, it can be accessed by the link https://valor.globo.com/impresso/20200302.
13The articles in the website were only available since August 2011. The website also went into maintenance from May 2019 to August 2019. For these months, there are no news available.
14An n-gram is a contiguous sequence ofnwords from a given sample of text. For example, “pension”
and “reform” are one-word n-grams, while “pension reform” is a two-word n-gram.
15Process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition and its context.
16Thenlpnet Part-of-Speech tagger is based on Fonseca and Rosa (2013).
cies are used to explain tail risk with a linear regression model:
T Rt =w0+w0Xt+t, t = 1, ..., T, (4.10) where w is a J vector of regression coefficients and w0 is the intercept. This regression will be estimated monthly fortranging from August 2011 to July 2020, totaling T = 104 observations. Since J T, this linear regression model cannot be estimated using the usual ordinary least squares method. To overcome the high dimensionality of the problem, I employ regularization via elastic net, as briefly described below.17
The elastic net is a regularized regression that linearly combines two forms of reg-ularization: LASSO (L1 norm) and ridge (L2 norm). The elastic net regression minimizes the following objective function:
H(w, w0) =
T
X
t=1
(T Rt−w0−w0Xt)2+λ1
J
X
j=1
|wj|+λ2(w0w). (4.11) The L1 penalty employs variable selection by setting elements of ˆw to zero, generating a sparse model. However, LASSO selects at most T variables and fails to do grouped selection, i.e., it tends to select one variable from a group of correlated variables and ignore the others. The inclusion of theL2 penalty removes the limitations on the number of selected variables, encourages grouping effect and stabilizes theL1regularization path.18 Therefore, elastic net works by setting weights for irrelevant variables to zero with theL1 penalty and at the same time shrinking coefficients of correlated regressors towards each other with theL2 penalty. This is especially suited for the large dataset of n-grams, since it should contain irrelevant words for tail risk and also correlated words that affect tail risk together. The elastic net regression is estimated numerically and the hyper-parameters λ1 and λ2 are chosen by 5-fold cross-validation.
The model above together with the text-based feature of the regressors can provide novel insights into the origins of tail risk fluctuations. In particular, the fitted coefficients
ˆ
wsupply direct evidence of which words chosen by the business press are associated to tail risk and whether this relation is positive or negative, identifying the types of concerns that the average Brazilian investor relates to tail risk. Furthermore, the same methodology can be applied to other uncertainty measures, allowing us to compare the concerns that each measure captures.
17In unreported results available upon request, I also use support vector regression, as in Manela and Moreira (2017). The results are similar to using elastic net.
18For more details, see Zou and Hastie (2005).