• Nenhum resultado encontrado

Identification of causal effects: a methodological review

N/A
N/A
Protected

Academic year: 2020

Share "Identification of causal effects: a methodological review"

Copied!
54
0
0

Texto

(1)

Pedro Medeiros Teixeira

Identification of Causal Effects: A Methodological

Review

Dissertação submetida à Escola de Matemática Aplicada como requisito parcial para a obtenção do grau de Mestre em Modelagem Matemática.

Área de Concentração: Ciência de Dados Orientador: Rodrigo dos Santos Targino

Rio de Janeiro 2020

(2)

Teixeira, Pedro Medeiros

Identification of causal effects : a methodological review / Pedro Medeiros Teixeira. – 2020.

53 f.

Dissertação (mestrado) -Fundação Getulio Vargas, Escola de Matemática Aplicada.

Orientador: Rodrigo dos Santos Targino. Inclui bibliografia.

1. Causalidade – Modelos estatísticos. 2. Ciências sociais - Metodologia. 3. Modelos matemáticos. 4. Estatística - Análise I. Targino, Rodrigo dos Santos. II. Fundação Getulio Vargas. Escola de Matemática Aplicada. III. Título.

CDD – 300.184

(3)

H

(4)

To my parents. The discipline and perseverance required to write my thesis I owe them.

To my girlfriend, Jennifer, who supported me unconditionally since the first moment. Without her, this task would be much more painful.

I would like to thank my friends: Lucas, Rodolpho and Gustavo. To my EMAp’s friends: Bernardo, Brenda, Igor and Marcelo. Besides good laughs, I learned a lot from them. To my friends at FGV CERI, who supported me at times when the thesis required me the most. In addition, I would like to thank the EMAp staff: Cirlei, Elisângela and Monica who were always very considerate and patient.

I would like to thank Professor Rodrigo Targino, who guided me in the research process and always directed me correctly when I was uncertain which way to go.

Finally, I would like to thank the committee members Professor Marcelo Fernandes and Professor Claudio Struchiner for the excellent comments.

(5)

Abstract

The present work is a methodological review of the two most common frameworks to address questions that involves causality: Neyman-Rubin’s potential outcomes and Pearl’s graphical models. The main purpose is to discuss identification issues clarifying the assumptions behind each approach, pointing out disagreements and, when feasible, finding areas where both frameworks are complementaries. Besides that, it is analysed the practical details of Difference-in-Differences and Synthetic Control, two methods that adopts the potential outcomes as the framework and are widely used in policy evaluation literature.

(6)
(7)

List of Figures

Figure 1 – (a) Cyclic Graph (b) Acyclic Graph with a variable Y confounding the causal effect of X on Z. . . 22

Figure 2 – Two forms of representing unobserved variables . . . 22

Figure 3 – (a) Chain: X 6⊥⊥ Z and X ⊥⊥ Z|Y ; (b) Fork: X 6⊥⊥ Z and X ⊥⊥ Z|Y ; (c) Collider: X ⊥⊥ Z and X 6⊥⊥ Z|Y . . . 23

Figure 4 – D-separation: X ⊥⊥ Y ; X ⊥⊥ Z; X ⊥⊥ Y |W, Z; K ⊥⊥ Y |W ; K ⊥

Y |Z; K ⊥⊥ X|W ; K ⊥⊥ Y |W, Z; X ⊥⊥ K|W . . . 24

Figure 5 – Backdoor Criterion, example taken from Pearl (1995) . . . 26

Figure 6 – Frontdoor Criterion, example taken from Morgan e Winship (2015) . . 27

(8)
(9)

Contents

1 Introduction . . . . 9

2 Potential Outcomes . . . 11

2.1 Basic Elements of Potential Outcomes . . . 11

2.2 Randomized Experiments and Observational Studies . . . 16

3 DAGs - Directed Acyclic Graphs . . . 21

3.1 Basic Elements of DAGs . . . 21

3.2 D-separation, Interventions and the Do-operator . . . 22

3.3 Backdoor criterion . . . 25

3.4 Frontdoor criterion . . . 26

3.5 DAGs and Potential Outcomes . . . 28

3.5.1 Pearl on Potential Outcomes . . . 28

3.5.2 Imbens on DAGs. . . 29

4 Difference in Differences . . . 31

5 Synthetic Control . . . 37

5.1 Basic Elements of SCM . . . 37

5.2 Linear Factor Model . . . 38

5.3 Estimation . . . 39

5.4 Placebo Tests and Robustness Checks . . . 41

5.5 Recommendation . . . 42

6 Conclusion . . . 45

(10)
(11)

9

1 Introduction

In many areas it is crucial to understand the specific effect of an event or variable on other variables. Observing simple correlations between these variables or observing the outcome variable before and after the event is not enough to identify this effect because there are other variables interacting at the same time with the event of interest. So, what is key is to measure the effect of the event or the intervention, disregarding these other variables. In other words, we need to measure the causal effect. There are many examples of causal questions in social and biomedical sciences. The researcher might need to measure the impact of a policy on a country’s GDP, how much drug use increased or decreased as a result of a prohibition law or the effect of some medicine on blood pressure. The statistical tools to handle causal questions can be divided into two main frameworks: the

Potential Outcomes or Rubin Causal Model, associated with the work of Donald Rubin

and the graphical models represented by Directed Acyclic Graphs (DAGs) related to Judea Pearl’s work.

The Rubin Causal Model intends to measure the difference between the outcome under a treatment1 and the outcome that would have taken place in the absence of the

treatment, which is not possible since only one of the states is observed. This is also known as the Fundamental Problem of Causal Inference and these states are known as Potential Outcomes or counterfactuals since they exist in theory but are not observed at the same time. (Rubin(1974)). There are three basic elements in this framework. The first element is the theoretical choice of representing the causal effect as a difference of two states or

potential outcomes. The measurement of individual-level causal effect is not feasible, but

in the presence of multiple units, the second element, some exposed to the treatment and some belonging to the control group is possible to have causal estimates. The third element required in this setup is understanding why some units receive the treatment and some other not, i.e., the assignment mechanism. The Potential Outcomes notation first appeared in Neyman (1923) in a randomized setting. However, the main contribution of Rubin’s work is the development of methods of causal inference with observational data, when the variables are not necessarily independent.

The graphical methods of Pearl’s approach also deals with observational data. This framework focuses on encoding a causal model based on background knowledge in a Directed Acyclic Graph (DAG) and then it is possible to know whether the causal effect is identifiable through some criterion and rules. The transparency of addressing the

1 Here, the treatment variable will be binary, however identifying the causal mechanism between two

continuous variables is also a question of interest. SeeHirano e Imbens(2004) andImai e Dyk(2004) for examples.

(12)

causal question in a graph opposes with the assumptions in the Rubin Causal Model that sometimes appear to be unrealistic.

Finally, there is a novel branch in causality literature whose main objective is to develop methods to construct a counterfactual with aggregate data, when control groups are not available and the confounding variables are not observed. Even though, they are highly associated with the Potential Outcomes approach, there is also room to incorporate some tools from the causal graph theory.

The present work is a non-exhaustive methodological review2 of both frameworks

as well as comparison focusing on the assumptions behind each approach. The Chapter2

discusses the Potential Outcomes Model and its assumptions and basic elements. The Chap-ter3 discusses the graphical models and contains a section comparing both frameworks. Chapters 4 and 5 show methods of one of the branches that evolved from the Poten-tial Outcomes approach: Difference-in-Differences and Synthetic Control. Furthermore, practical details of these methods are discussed.

2 Methods such as instrumental variables and regression discontinuity design, although very common in

the econometrics toolbox, were not covered as well as estimation and statistical inference issues. In the DAG’s part of the dissertation technical details of do-calculus were also omitted.

(13)

11

2 Potential Outcomes

The idea that causal effects are the difference between potential outcomes was introduced by Neyman (1923) in the context of randomized experiments. The first work that recommended random assignment as the ideal strategy for assessing causality was

Fisher (1925). The use of the potential outcomes as a theoretical tool to express causal questions arose in the analysis of demand and supply functions and other simultaneous equations by Tinbergen(1930) and Haavelmo (1943). However, the consolidation of the potential outcomes framework as a general approach for causal inference, independently of whether the study is randomized or observational took place when the statistician Donald Rubin published the paper Estimating The Causal Effects In Randomized And

Nonrandomized Studies (Rubin (1974)). That is the reason why the potential outcomes framework and the Rubin Causal Model are terms used interchangeably (Holland (1986)). The vast majority of the empirical studies in economics and social science using methods such as difference in differences, regression discontinuity design, randomized experiments, synthetic control, instrumental variables follow the potential outcome approach, which is a sign of its success as a causal inference framework. (Imbens e Rubin (2015)).

2.1

Basic Elements of Potential Outcomes

The goal of this framework is to evaluate the effect of a treatment 1 applied

to a unit. The Potential Outcome Framework is based on three key notions. The first notion is to describe the variable of interest after the treatment in two different states or potential outcomes. The comparison of the variable in both states is the causal effect of the treatment2. As pointed by Holland (1986), the Fundamental Problem of Causal

Inference is that only one of the potential outcomes is observed ex-post for each unit, that

is why the second element of the framework is required: the presence of multiple units, some exposed to the treatment and some belonging to the control group. The third notion is understanding why some units receives the treatment and some others did not, that is what is called assignment mechanism. The assignment mechanism can be defined as a function of the covariates and the potential outcomes. The presence of multiple units itself does not allow one to measure causal effects based on comparisons of the group that receive the treatment and the group that did not, that is why the assignment mechanism is important. In randomized experiments, the researcher controls how the units will receive the treatment, which means that the assignment mechanism is known. On the other hand,

1 Treatment, Intervention or Action are terms used interchangeably in Potential Outcomes Framework. 2 Treatments always occur temporally before any potential outcome becomes observable (Imbens e

(14)

in observational studies, where randomization is not feasible, the pretreatment variables or covariates play important role to make the treatment as good, as randomly assigned, although relying on strong assumptions.(Athey e Imbens(2017a); Imbens e Rubin (2015)). The variable of interest Yi can be described with two states, treated or not treated, but only one is indeed observed for each individual or unit i. For N units, Y(0) and Y(1) represent the vectors with the potential outcomes, treated and not treated respectively, with i = 1, ..., N . The ith element is observed accordingly to the treatment assignment represented by D, a (N × 1) vector, with the ith element Di ∈ {0, 1} indicating not treated and treated units, respectively. So, the ith element of Y(0) and Y(1) is observed if Di is equal to zero and one, respectively, i.e., Yobs

i = Yi(Di). There are 2N possible values for D in a binary treatment context. The number of treatment units, Ntreat is given by PNi=1Di. In addition, there are covariates Z, a (N × k) matrix of covariates with ith row equal to Z

i. These variables are a priori known to be unaffected by the treatment assignment, because they are fixed characteristics of the unit or they are realized before the treatment being assigned. Yiobs =      Yi(0) when Di = 0 Yi(1) when Di = 1 (2.1)

The unit-level causal effect τi of the treatment is the difference between the two potential states:

τi = Yi(1) − Yi(0) (2.2)

Definition 1. (Causal and Descriptive Estimands) An estimand that can be written

as a function of (Yobs, D, Z) and there is no dependence of (Y(1), Y(0)) is said to be descriptive. Causal estimands are functions of (Y(1), Y(0)) that cannot be written as

function of (Yobs, D, Z) (Abadie e Cattaneo(2018); Abadie et al. (2017))

In a hypothetical situation of observing the realized outcomes of every population’s unit the researcher would be able to discover the value of a descriptive estimand with certainty. However, the same is not true for causal estimands due to the dependence of both potential outcomes (Abadie et al. (2014)).

The unit-level causal effect is the simple difference between the potential outcomes, which is not observable because one of the terms is never observed. The causal estimand summarizes the effect to an aggregated setting. Estimands such as Average Treatment Effect (ATE) or Average Treatment Effect on Treated (ATET) are standard causal estimands of

(15)

2.1. Basic Elements of Potential Outcomes 13

policy evaluation research (Angrist, Imbens e Rubin (1996); Heckman e Singer (2008)). The estimand that captures the causal effect for the entire population (ATE) of units is:

τAT E = E[Yi(1) − Yi(0)] (2.3)

If the researcher aims to measure the impact of the intervention only in the treated units, then ATET should be used:

τAT ET = E[Yi(1) − Yi(0)|Di = 1] (2.4) Without additional assumptions, it is not possible to estimate the ATE or ATET by simply taking the difference between the outcome of treated and untreated units, which is called Naive Estimator3. (Morgan e Winship(2015); Angrist e Pischke (2008))

τN aive = E[Yiobs|Di = 1] − E[Yiobs|Di = 0] = E[Yi(1)|Di = 1] − E[Yi(0)|Di = 0] = E[Yi(1)|Di = 1] − E[Yi(0)|Di = 0] + E[Yi(0)|Di = 1] − E[Yi(0)|Di = 1] = E[Yi(1) − Yi(0)|Di = 1] | {z } ATET + E[Yi(0)|Di = 1] − E[Yi(0)|Di = 0] | {z } bias (2.5)

The equation above is a descriptive estimand, which is a simple mathematical representation of the statement association is not causation, i.e., an observed correlation between variables does not mean that an action that affects one variable will affect, necessarily, the other. The bias is generated through confounding, that may arise when there is information associated with both potential outcomes and the treatment assignment (Abadie e Cattaneo (2018)). Standard ways to eliminate the confounding bias are detailed in Section 2.2. It is important to say that these are not the only ways to make causal inference. In the Potential Outcomes Framework, there are many empirical strategies to have estimates with causal interpretation.

Definition 2. ((Assignment Mechanism), Imbens e Rubin (2015)) Given a

popu-lation of N units, the assignment mechanism is a function P(D|Z, Y(0), Y(1)) that assign probabilities to all 2N possible values (full assignment) for D, taking on values in [0,1],

satisfying:

X

D

P(D|Z, Y(0), Y(1)) = 1, ∀ Z, Y(0), Y(1) (2.6)

(16)

where D is the set of all (N × 1) vectors with Di ∈ {0, 1} and its cardinality is 2N. The assignment mechanism is the probability that a specific value for the full assignment will occur.

To find a causal estimand, it is necessary additional assumptions. The first assump-tion is related to the stability of the treatment. In the notaassump-tion of potential outcomes it is already implicit that the assignment of a unit does not affect the assignments of other units. This is expressed in the assumption called SUTVA.

Assumption 1 (Stable Unit Treatment Value Assumption - SUTVA, Rubin

(1980), Cox (1958)). The realized outcome for each unit depends only on the value of the

treatment of that unit, i.e., the potential outcomes of a unit are unaffected by the treatments other units receive

Assumption (1) implies that there are no interference between the units and there is only one type of treatment. For some settings, this assumption can be unrealistic. For example, in a epidemiological setting, where the treatment of a unit certainly benefits other units when the treatment is vaccination and the outcome is an infectious disease (Halloran e Struchiner (1995)). Other examples where the spillover effect is clear such as the presence of peer effects in educational policy evaluation literature (Hong e Raudenbush

(2006)). In that cases, assuming SUTVA may not be appropriate.

In section 2.2, there are some experiment designed to mitigate peer effects without adopting strong assumptions. (Abadie e Cattaneo (2018);Athey e Imbens (2017a)).

The impossibility of measuring unit-level causal effects can be seen as problem of

missing data and the challenge is to predict the unobserved potential outcome (Rubin

(1974)). The covariates have a central role in this prediction task for three reasons. First of all, adding variables that represents characteristics of the unit increase the precision of estimates. The second reason is that covariates can be used to stratify the treatment group into subgroups, which makes the results more informative. For example, if the researcher wants to study the effect of a public policy on women, the gender variable helps accomplish it. Finally, these characteristics can be correlated to potential outcomes, which confounds the comparison between control and treatment groups. This implies that the causal effects can be identifiable within subgroups that are homogeneous conditionally given the covariates. This is what is known as the Unconfoundedness Assumption (Imbens e Rubin (2015)).

(17)

2.1. Basic Elements of Potential Outcomes 15

Assumption 2 (Unconfoundedness4, Rosenbaum e Rubin(1983)).

((Y (1), Y (0)) ⊥⊥ Di)|Z = z, ∀z ∈ Z, (2.7)

The Unconfoundedness Assumption (2) combined with another assumption re-garding the role of covariates called Common Support 5 Assumption is known as Strong

Ignorability Assumption. The former is the principal target of critics of the Potential

Outcomes setup, mainly because it is not testable. The latter guarantees that the difference between covariates distributions in both treatment and control groups is not substantial. High imbalances between those subsamples would imply that some regions of the covariate space would have relatively few or none units of one of the groups, making inference for such regions rely on extrapolation (Imbens e Rubin (2015)).

Assumption 3 (Common Support,Rosenbaum e Rubin (1983)6). For each value of

Z, there is a probability greater than zero and smaller than 1 of a unit being treated or untreated.

0 < P[Di = 1|Zi = z] < 1, ∀z ∈ Z (2.8)

where Z is the the covariate space.

The violation of the assumption above is an issue in labour market policies aimed at training unemployed workers (Lechner et al.(2011)). For example, if a training program is mandatory for workers that earned less or equal the minimum wage (wmin), in a study that evaluates the causal effect of this program the variable wage (Zwage) will be imbalanced as there is no control unit in the region Zwage< wmin, which violates the Assumption (3).

The Strong Ignorability Assumption removes the bias and the causal estimand

τAT E can be achieved using experimental or observational data. Using (2.1) and the Unconfoundedness Assumption (2):

E[Yiobs|Zi = z, Di = 1] − E[Yiobs|Zi = z, Di = 0] = E[Yi(1)|Zi = z, Di = 1] − E[Yi(0)|Zi = z, Di = 0]

= E[Yi(1)|Zi = z] − E[Yi(0)|Zi = z] (2.9)

4 This assumption is also known as ignorability, selection on observables, conditional independence. It is

a close definition to what is called, in econometrics literature, exogeneity. However, to be equivalent to unconfoudedness, the exogeneity assumption must be combined to the constant treatment assumption (Imbens(2004)).

5 The support of a random variable is the set of realizations that occur with probability greater than

zero (Angrist e Pischke(2008)).

(18)

Taking the expectation with respect to Z7:

EZ{E[Yi(1) − Yi(0)|Zi]} = E[Yi(1) − Yi(0)]

= τAT E (2.10)

The Common Support Assumption here plays an important role by enabling the identification of E[Yobs

i |Zi = z, Di = 1] − E[Yiobs|Zi = z, Di = 0] for all values of z. Otherwise, there would be values of z only in one of the assignment groups (Imbens e Wooldridge (2009),D’Amour et al. (2017)).

A common argument for observational studies is that the Unconfoundedness Assumption is more plausible when more variables are added as covariates (Imbens e Rubin(2015)). Although the assumption that the more variables are included in the set of covariates, the more precise is the estimates of the causal effect is controversial (Pearl

(2011)), it is important to note that the more covariates are added in the analysis, the less plausible is the Common Support Assumption (3), because it increases the chance of predicting exactly the treatment assignment within some subpopulations, which makes the identification and estimation of the causal effect impossible for some of these subpopulations. If there are characteristics that are present only in the treated or untreated group, then the Assumption (3) is violated (D’Amour et al. (2017);Lechner et al. (2011)).

2.2

Randomized Experiments and Observational Studies

One of the most successful empirical strategies to handle confounding effects is randomized experiments or RCT8, which is often identified as the gold standard of causal

inference (Bothwell et al. (2016)) or as the most credible of designs to obtain causal

inferences (Athey e Imbens (2017a))9. The direct goal of randomization is to achieve

independence between the treatment assignment and the potential outcomes, which eliminates the bias of the simple difference of treatment and control groups. This design empowers the researcher with the control over the mechanism that decides which units receive the treatment. If the units are randomly assigned to the treatment, variables or units characteristics that possibly are associated with both potential outcomes and treatment are no longer associated with the latter. As a result, the individuals of the control and treatment groups are different in expectation only through the intervention effect (Duflo, Glennerster e Kremer (2007)). In other words, under random assignment, the descriptive estimand is unbiased for the causal estimand (Abadie et al. (2014)). That

7 The same result for τ

AT ET is straightforward.

8 Randomized Controlled Trials.

(19)

2.2. Randomized Experiments and Observational Studies 17

is why in a experimental setting, the assumption that the assignment is unconfounded arises naturally, which imply, in a setting without covariates, that

(Y(1), Y(0)) ⊥⊥ D, (2.11)

and,

P(D|Y(0), Y(1)) = P(D) (2.12)

The simplest treatment assignment mechanism of a RCT is a Bernoulli Trial for each unit, which ends up giving probability greater than zero for full assignments with all the units receiving the treatment or control status, certainly not a desirable way to make causal inference (Imbens e Rubin(2015)). However, there are different manners of designing RCTs efficiently. Following Athey e Imbens (2017a) notation and taxonomy of randomized experiments, it is possible to classify experiments in four types of assignment mechanisms.

In a Completely Randomized Experiment, the first assignment mechanism to be analysed, a fixed number, Ntreat, of units is chosen randomly to receive the treatment. The remaining units are assigned to be in the control group. The simplest design is drawing an even number N of units and assigning Ntreat = Ncontrol = N/2 to both control and treatment group. Using Definition (2) of Section 2.1:

P(D|Y(0), Y(1)) = N Ntreat !−1 = Ntreat!(N − Ntreat) N ! , ∀ D such that N X i=1 Di = Ntreat (2.13) The second experimental design is called a Stratified Randomized Experiment. The goal of the researcher when deciding for this assigment mechanism is to have more informative results. Firstly, the population is partitioned based on some covariates values into G subgroups.

Let Z be the covariate space, with Z1, ..., ZG strata such that

(i) S

gZg = Z

(ii) ZgTZg0 = ∅, if g 6= g0

(iii) Gig = 1Zi∈Zg, is an indicator for unit i belonging to the stratum g.

(iv) Ntreat = PGg=1Ntreat,g, where Ntreat,g is the fixed number of treated units in each stratum g and Ng is the number of units in stratum g.

(20)

Then, the assignment probability is P(D|Y(0), Y(1), Z) = G Y g=1 Ng Ntreat,g !−1 , ∀D such that N X i=1 Gig · Di = Ntreat,g (2.14)

The third assignment mechanism is an extreme version of the stratified setting and it is known as Paired Randomized Experiment. Here, the number of strata is N/2, there are two units per strata and Ntreat,g = Ncontrol,g = 1. So, in each subgroup there are one treated unit and one control unit.

P(D|Y(0), Y(1), Z) = 1 2 N2 , ∀D such that N X i=1 Gig · Di = 1 (2.15) Finally, the last described assignment mechanism is called Clustered Randomized

Experiment. This design is interesting when SUTVA may be violated through peer effects

between units. Instead of assigning individuals that have local interactions to control and treatment group, clusterization may be an option, once it avoids units that have different treatment assignments to affect each other. For example, in Devoto et al.(2012) the objective is to evaluate the effect of household water connection. The assignment to the treatment was done by a awareness campaign and credit facilitation. The clustering design in this setting is appropriate because one part of the treatment is basically access to information about the procedures to obtain the connection and it can be disseminated in the neighborhood, making control units affected by the treatment. A clusterization by location can avoid it.

P(D|Y(0), Y(1), Z) = G Gt !−1 , ∀D such that ∀g N X i=1 Gig· Di = 1 (2.16) where G is the total number of clusters and Gt is the number of treated clusters 10.

In many cases, when there are some ethical reasons involved, a RCT is not feasible to be conducted. Using simultaneous equations terminology, when randomization is not possible, a researcher may use some kind of identification design. One of them is to achieve the same result of a randomized experiment, by conditioning the confounding variables in order to achieve independence between the treatment assignment and the potential outcomes. In this setting, the researcher no longer controls the functional form of the assignment mechanism, which is unknown. However, under the Unconfoundedness Assumption, it is possible to have an assignment as good as random. If Zi is composed

only by binary or discrete variables, subclassification may be an alternative to adjust for covariates, however if there are many continuous covariates, subclassification is not

(21)

2.2. Randomized Experiments and Observational Studies 19

feasible. A good strategy is to summarize all the covariates into a scalar. The probability of assignment of each unit, although it is not known a priori, can be written in terms of its

propensity score, e(z). The propensity score is the probability of receiving the treatment

as a function of covariates:

e(z) = P(Di = 1|Zi = z) = E[Di|Z = z] (2.17) Differently from RCTs, the propensity score is unknown in observational studies and has to be estimated. The propensity score is usually estimated by a logistic regression using Di as the dependent variable. One advantage of using the propensity score is that

the outcome data is not used in estimation. Once e(z) is estimated, the next step is to use some matching technique11, then, instead of stratifying among all the values of the covariates, it is possible compare the observed outcomes within groups that have similar propensity score. (Imbens e Rubin (2015);Rosenbaum e Rubin (1983)).

As pointed before, the Unconfoundedness Assumption enables eliminating the bias of a comparison between treated and control groups by adjusting for differences in covariates, which also means that within subpopulations homogenous in the propensity score the result is the same, i.e., the propensity score play the same role of the covariates in the conditional version of the Unconfoundedness Assumption. (Imbens e Wooldridge

(2009)):

(Y (1), Y (0)) ⊥⊥ Di|Z = z =⇒ (Y (1), Y (0)) ⊥⊥ Di|e(z) (2.18) In this setting, the comparison between treated and control group represents causal effects only within subpopulations defined by values of the covariates Z. In other words, the assignment mechanism is not randomly defined, however within each strata defined by

Z, the assignment is assumed to be random. In the presence of high imbalances between

groups, using least squares to estimate τAT E is not appropriate (Abadie e Cattaneo(2018)) because it can rely on extrapolation12.

11 For critical views on propensity score matching, seeKing e Nielsen(2019). 12 There are nonparametric strategies to estimate τ

(22)
(23)

21

3 DAGs - Directed Acyclic Graphs

The Potential Outcomes approach is not the only framework to study Causal Inference. Judea Pearl’s work on graphical causal model has been the most important study on causality outside of the Econometrics mainstream (Cunningham(2018)). Although the main goal of both frameworks is to answer causal questions, the means to achieve it are distinct. While PO develops the strategy of identifying causal effects based on multiple units, states and assumptions such as ignorability and SUTVA, Pearl’s work focus on encoding a causal model based on domain knowledge in a Directed Acyclic Graph (DAG). Once the researcher has a DAG to work with, it is possible to know whether the causal effect is identifiable through some criterion and rules. Despite the fact that both approaches have been developed separately over the years, they can be seen as complementary. (Morgan e Winship (2015); Imbens (2019)). Inspired on Wright (1921) and Haavelmo (1943), the development of Judea Pearl presents a machinery to identify causal effects and to understand clearly how to manipulate observational data to measure the effect of interventions1. (Hünermund e Bareinboim (2019);Pearl(2009))

3.1

Basic Elements of DAGs

A graph consists of a set of nodes that represents random variables and a set of edges that connects pairs of nodes and represent a causal effect. The edges that indicates the direction of these relationships are called directed and the graph that contains those edges is called directed graph. Nodes represented by a solid circle are observed variables, whereas those represented by an empty circle implies that the variable is unobserved. (Pearl (2009))

A path is a sequence of edges pointing in any direction, nonetheless if every edge in a path is an arrow that points from one to the subsequent node, then this path is called

1 InPearl e Mackenzie(2018), it is developed the concept of Ladder of Causation, which classifies the

scientific questions in three rungs. The first rung is the least complex task, which is association. It is basically to execute predictive routines by identifying patterns in data. The second rung is intervention and it seeks to answer causal questions by manipulating observable data. According to Pearl’s graphical setup, if a researcher is equipped with a causal model and, consequently, with a DAG, the success in answering the questions placed in the second rung can be achieved. Finally, in the third rung there is the counterfactual, which is basically to guess how some phenomena would have taken place if an action had been taken, which is not possible to answer with data. Thinking of the relationships between variables and how causality emerges from these relationships requires mathematical techniques such as DAGs and do-calculus. Do-calculus is a set of inference rules developed to check whether a causal query is identifiable with the observed data. It is a generalization of the Backdoor and Frontdoor criterions and it is proved to be complete, i.e., if a causal effect is identifiable, do-calculus will assuredly return the solution (Huang e Valtorta (2012)). The seminal paper of do-calculus isPearl(1995). See

(24)

X

Y

Z X

Y

Z

Figure 1: (a) Cyclic Graph (b) Acyclic Graph with a variable Y confounding the causal effect of X on Z.

X Y X

U

Y

Figure 2: Two forms of representing unobserved variables

directed, for example, a path with N nodes such X1 → X2 → ... → XN is a directed path. A graph that contains a path that exists from a node to itself is called cyclic. In Pearl’s framework, the graphs are acyclic, which implies that DAGs cannot represent simultaneity (Heckman e Pinto (2015)). One form of classifying the connections between nodes is to use kinship terminology. For a given node, all the other nodes that have arrows pointing to it are called parents and all the nodes coming from it are called children. In a directed path X → Z → Y , X is parent of Z and ancestor of Y. On the other hand, Z is child of X and Y is descendant of X (Pearl, Glymour e Jewell (2016)). It is important to note that a DAG is a representation of a structural causal model. The combination of both is what

Verma e Pearl (1990) call causal theory or causal model (Pearl (2009)). The DAG is the

causal structure, i.e., a schematic that informs how the variables are related to each other

in the structural causal model.2

3.2

D-separation, Interventions and the Do-operator

One of the main challenges of causal inference is to establish whether the association between two variables is resulted by an interaction with other variables or has some causal meaning, i.e., the relationship is clear of confounding. In causal graphs, there are three basic structures that summarize interactions among three variables. A configuration that three variables are connected by two directed edges is called chain. It is easy to see in Figure 3(a) that X only causes Z through Y, which implies that if Y is controlled or, in Pearl’s terminology, blocked, an independence between X and Z comes up. Blocking means stopping the flow of information between two variables connected by a path. The same

2 See Hünermund e Bareinboim (2019) andPearl(2009) for a formal definition of structural model.

It is worth noting that the DAG in a structural-form is explicit about the background variables, i.e., variables caused by factors outside the model. This topic is related to structural equations model introduced byWright(1921). SeePearl (2012a) for a brief history of causality and structural models.

(25)

3.2. D-separation, Interventions and the Do-operator 23

phenomena happens in other configuration called fork, when two variables have the same parent. In Figure 3(b) X and Z are dependent due to a common cause Y (Pearl (2009)).

X Y Z X Y Z X Y Z

Figure 3: (a) Chain: X 6⊥⊥ Z and X ⊥⊥ Z|Y ; (b) Fork: X 6⊥⊥ Z and X ⊥⊥ Z|Y ; (c)

Col-lider: X ⊥⊥ Z and X 6⊥⊥ Z|Y

The structure that deserves some attention is called collider3. In Figure 3(c) Y

is a collision node because there are two arrows pointing into it. The result regarding independence is that X and Z are unconditionally independent but, in this case, conditioning on Y, instead of eliminating, ends up creating an artificial association between X and Z. Using the framework’s terminology, conditioning on a collider opens the path, differently from what happens when conditioning the third variable in a fork or chain. As a result, controlling for potential confounding variables that are colliders in a causal model will bias the causal estimation (Pearl, Glymour e Jewell (2016)).

The collider bias can be illustrated using a simple example of Elwert e Winship

(2014). Suppose that the two main requirements to become a Hollywood star is beauty and talent (T alent → F amous Actor ← Beauty) and they are independent. This structure makes the action "become a famous actor" a collider. So, when analysing only the sample of famous actors, which is the same of conditioning on F amous Actor, the researcher will find a negative correlation between both requirements, however, given that beauty and talent are uncorrelated, this association is spurious.

Elwert e Winship (2014) distinguish what is a sample selection bias4 and an

endogenous selection bias, which is the same of what was defined earlier as collider bias. The sample selection bias occurs when the mechanism that selects observations of a sample is associated with the outcome, on the other hand, the endogenous selection bias (collider bias) arises when conditioning on a variable that is caused, or is associated with a variable that is caused, by the treatment and by the outcome. One important characteristic of collider bias is that it can occur even if the sample is unbiased or the study is carried out in the population. So, the endogenous selection bias is not concerned of selection mechanism of the observations, but focused on the selection of the treatment itself.5.

3 Also called an inverted fork.

4 To understand which graphical strategies should use dealing with sample selection bias, seeHünermund

e Bareinboim(2019).

5 The collider bias is a close definition to what some authors call selection bias, however there are some

subtle differences: i) collider bias is a phenomena that do not depend whether the sample is biased, it is associated with the relationship between the variables; ii) sample selection bias can be controlled in many situations; iii) although in many cases effect can be the same, the DAG representation is

(26)

Another example of a collider is, whatElwert e Winship (2014) calls nonresponse

bias presented in Lin e Seltzer (1999), which analyses the effect of nonparticipation of divorced fathers in a survey that study the impact of child support. Both the divorced father’s income (I) and how much he pays as support (S) are causes of his response behavior in the survey (R) which implies that I → R ← S, with R as a collider. Therefore, carrying out a survey with this feature is equivalent to conditioning on a collider, i.e., a bias is generated in the final results. The association that emerges when conditioning on the collider variable is also called Berkson’s bias or Berkson’s Paradox

These structures such as chain, fork and collider help understand paths with three variables, however graphs can be much more complex and a more general definition is required.

Definition 3. (d-Separation, Pearl (2009)), A path p is said to be d-separated (or

blocked) by a set of nodes Z if, and only, if

1. p contains a chain of nodes A → B → C or a fork A ← B → C such that the middle

node B is in Z (i.e., B is conditioned on), or

2. p contains a collider A → B ← C such that the collision node B is not in Z, and no

descendant of B is in Z. X Y Z W K Figure 4: D-separation: X ⊥⊥ Y ; X ⊥⊥ Z; X ⊥⊥ Y |W, Z; K ⊥⊥ Y |W ; K ⊥⊥ Y |Z; K ⊥X|W ; K ⊥⊥ Y |W, Z; X ⊥⊥ K|W

In figure 4, X and Y are unconditionally d-separated 6, because the only path

between them, X → W ← Z → Y is blocked by the collider W. It is worth noting that conditioning on K, i.e, a descendant of W has the same consequence of d-connecting X and Y. Yet, conditioning on W or K opens the path between X and Y and it is possible to block this path again by conditioning on Z as well. D-separation allows to list the independence relationships between every variable in a DAG because these implications can be automatized by algorithms (Textor e Liskiewicz (2012)). It is also a key concept to present the building block of Pearl’s causal inference approach: the backdoor and frontdoor criterion, identification strategies that will be detailed in sections3.3 and 3.4.

different to highlight this conceptual distinction.

(27)

3.3. Backdoor criterion 25

The do-operator of P (Y |do(X = x)) represents the probability distribution of Y if X is set to x by an intervention, i.e., it simulates physical interventions setting X to a fixed value and maintaining the rest unchanged. The objective is to construct a mathematical representation of counterfactuals. In a DAG, this is represented by erasing the arrows pointing into X. In other words, an intervention eliminates the natural causes of X by interrupting the original data generating process (Pearl (1995);Pearl (2012b);Hünermund e Bareinboim (2019)).

3.3

Backdoor criterion

The choice of control variables based on the Strong Ignorability Assumption is one of the main points of disagreements between the DAG approach and Potential Outcomes approach on causality. Understanding through a graphical model which variables should be controlled for is an alternative approach of Pearl’s framework and one of the main identification strategies is known as Backdoor Adjustment (Pearl (2009)).

A backdoor path from X to Y is every path that starts with an edge pointing into the first node. For example, X ← Y → Z is a backdoor path, while X → Y → Z is not. Those paths must be blocked because they often are a source of confounding. The Backdoor Criterion guides how to choose the variables to condition on.

Definition 4. (Backdoor Criterion, Pearl (2009)) A set of variables Z satisfies the

backdoor criterion relative to an ordered pair of variables (X, Y) in a DAG G if:

1. no node in Z is a descendant of X; and

2. Z blocks every path between X and Y that contains an arrow into X

If there exists a set of variables Z satisfying the above criterion for X and Y, then the causal effect of X on Y is given by computing

P(Y = y|do(X = x)) =X z

P(Y = y|X = x, Z = z)P(Z = z) (3.1) which is known as the backdoor adjustment formula.

A simple procedure to find a set of variables Z that meets the backdoor criterion is to list every backdoor paths from X to Y and identifying which of these paths are open, i.e, the first node is d-connected to the last (Morgan e Winship(2015)). In Figure5, there are four backdoor paths:

(28)

X Y X2 X5 X3 X4 X6 X7

Figure 5: Backdoor Criterion, example taken from Pearl (1995)

(ii) X ← X3 ← X4 → X5 ← X6 → X7 → Y

(iii) X ← X5 → Y

(iv) X ← X5 ← X6 → X7 → Y

The path (ii) is blocked due to the collider X5. So it is necessary to block paths (i),

(iii) and (iv). It is easy to see that X5 has to be in the set Z of variables because it is the

only choice to block path (iii). In addition, in doing so, it blocks paths (i) and (iv). However, conditioning on Z brings another problem. The already blocked path (ii) opens when X5

is adjusted. So, it is inevitable to control for other variable among X3, X4, X6, X7. The

good practice guides to choose the minimally sufficient for meeting the backdoor criterion, which are {X5, X3}, {X5, X4}, {X5, X6}, {X5, X7} in Figure 5 (Morgan e Winship (2015)).

In fact, the Backdoor Criterion (4) is equivalent to the Unconfoundedness Assump-tion (2), ((Y (x) ⊥⊥ X)|Z = z, formulated in the Potential Outcomes approach, with X being the treatment variable and Z the covariates (Pearl(2009)).

P(Y (x) = y) =X z P(Y (x) = y|Z = z)P(Z = z) =X z P(Y (x) = y|Z = z, X = x)P(Z = z) (3.2) =X z P(Y = y|Z = z, X = x)P(Z = z) which is the backdoor formula adjustment.

3.4

Frontdoor criterion

In empirical research, strategies to work with unobserved variables must be in the researcher’s toolbox. Although the backdoor criterion is a powerful strategy to eliminate confounding bias, sometimes there are backdoor paths that are unblockable because the middle variable is not observed, which is the case of Figure 6. However, under certain conditions, the causal mechanism can be fully identified through intermediate variables.

(29)

3.4. Frontdoor criterion 27

Definition 5. (Frontdoor Criterion, Pearl (2009)) A set of variables Z is said to

satisfy the front-door criterion relative to an ordered pair of variables (X, Y) if:

1. Z intercepts all directed paths from X to Y;

2. there is no unblocked backdoor path from X to Z; and 3. all backdoor paths from Z to Y are blocked by X.

If the conditions of the Frontdoor Criterion are met by a set of variables Z, then the causal effect of X on Y is identifiable by the formula

P(Y = y|do(X = x)) =X z

X

x0

P(Y = y|Z = z, X = x0)P(Z = z|X = x) (3.3)

also known as the Frontdoor Adjustment. Where x0 is an index of summation and must be distinguished from x. The proofs of the backdoor and frontdoor adjustment formulas can be found in Pearl (2009).

The example taken from Morgan e Winship (2015) in Figure 6 illustrates how the mechanism of the Frontdoor Criterion works. The causal effect of X on Y cannot be identified using the Backdoor Criterion because U, the variable that can block the backdoor path between the variables, is not observed. Thus, it is worth looking at the set of intermediate variables Z = {Z1, Z2}.

X Y

Z1

U

Z2

Figure 6: Frontdoor Criterion, example taken from Morgan e Winship (2015)

The first condition of the Frontdoor Criterion is met. The two directed paths

X → Z1 → Y and X → Z2 → Y are intercepted by Z. The second established condition

is to guarantee that all the backdoor paths from X to the intermediate variables Z are blocked, which is the exact case. In both X ← U → Y ← Z1 and X ← U → Y ← Z2, Y

is a collider, implying that X and Z are d-separated except through the directed path. Finally, all the backdoor paths from Z to Y can be blocked by conditioning on X:

(30)

(i) Z1 ← X → Z2 → Y

(ii) Z1 ← X ← U → Y

(iii) Z2 ← X → Z1 → Y

(iv) Z2 ← X ← U → Y

It is worth mentioning that, recently,Bareinboim e Pearl(2016) developed a scheme to handle multiple sources of data that were generated from different populations and in different settings. The authors also point attention to the transportability issue, i.e., how findings of a study can be extrapolated to other settings7. Beyond Potential Outcomes and

Pearl’s approaches, other frameworks to deal with causal problems have been developed. It is worth citing the epidemiological research of James Robins8 and coauthors (Greenland e

Robins(1986), Robins e Greenland(1992)).

3.5

DAGs and Potential Outcomes

It is common to see Pearl’s DAG approach on causality as opposite to Rubin Causal Model or the Potential Outcomes approach, mainly because both frameworks have a strong critical view on each other. The present work views both approach as complementary following Morgan e Winship(2015), Cunningham(2018),Abadie e Cattaneo (2018).

The goal of both frameworks is to identify causal effect using observed data. In graphical causal models P(Y |do(X = x)) must become, if it is possible, a do-free expression where every variable must be observed. While when handling potential outcomes, Y (0) and

Y (1) must be identified with the observed data Yobs (Imbens (2019)). In both approaches additional knowledge is required: a DAG is a knowledge represention of Pearl’s causal theory and assumptions such as SUTVA (1) and Unconfoundedness (2), which is equivalent to the backdoor adjustment, are necessary in the Potential Outcomes approach.

3.5.1

Pearl on Potential Outcomes

One of the main assumptions of the Potential Outcomes approach is the Uncon-foundedness Assumption (2), also known as the Ignorability Assumption. The rationale behind the assumption, in its conditional version, i.e., in a nonrandomized setting, is that if you control for a set of pre-treatment variables X you can have a causal estimate. So, in other words your result is "as good as random". However, there is little guidance of how to select the covariates, which is sometimes based on verbal explanations or, as in Imbens e 7 In social sciences it is usually called external validity.

(31)

3.5. DAGs and Potential Outcomes 29

Rubin (2015), the recommendation is to use every pre-treatment variable in the data set, ignoring, for example, the presence of collider variables9.

Often in this setting, the number of pre-treatment variables is substantial, typically because, conditional on a large number of pre-treatment variables, unconfoundedness is more plausible. (Imbens e Rubin (2015), page 32)

To Pearl, one of the reasons of these lack of guidance in the Rubin Causal Model exists because the potential outcomes are taken as primitives and are not derived from a

causal model or from any formal representation of scientific knowledge (Pearl (2009), page 243). Differently, a DAG represents a causal model which summarizes previous domain knowledge. This model encodes relationships among variables, observed or unobserved, based on theory or empirical research. Once the DAG is constructed, the next step is to check whether the DAG is identified with the available data using the Backdoor, Frontdoor criterions (4),(5) and their generalization, the do-calculus. (Hünermund e Bareinboim

(2019), Elwert e Winship (2014),Pearl (2009)).

Besides making more transparent the selection of variables that makes the con-founding effects "ignorable", DAGs also help researchers to better communicate the causal model and facilitates the improvements from forthcoming research by enabling other researchers to inspect the covariate selection, which is not straightforward when evoking the Unconfoundedness Assumption (Hünermund e Bareinboim (2019)).

3.5.2

Imbens on DAGs

The focus of the algebraic construction of do-calculus is to identify causal effects in models given by a DAG, no matter how complex it is and the arguments in favor of Pearl’s approach is that Potential Outcomes framework is not able to achieve the identification of those complex system without strong and unrealistic assumptions. However, there are questions unanswered in this procedure. First of all, it is not clear how to develop the DAG and what comes after it is ready, which makes the process of identification disconnected from other steps such as estimation and inference procedures. On the other hand, the rising of a bunch of identification strategies took place using the Rubin Causal Model as the starting point. For example, advances in instrumental variables were only possible when the framework shifted to PO (Angrist, Imbens e Rubin (1996)). This feature leads to a second advantage of PO over DAGs: applied works in the former is vaster than the latter in social sciences (Imbens (2019)).

In the absence of such concrete examples the toy models in the DAG literature sometimes appear to be a set of solutions in search of problems, rather than a set of

9 SeeMiddleton et al. (2016) to see how to condition on every variable in the data set may lead to bias

(32)

solutions for substantive problems previously posed in social sciences.(Imbens (2019), page 4)

A third advantage of PO (or weakness of DAGs) is the capability of answering specific policy evaluation questions because the assignment mechanism is not explicit in the do-operator which means that causal effect of interest is not linked to a specific manipulation. Following the motto "no causation without manipulation", causal variables, according to the PO approach, may have the possibility of being different from the value actually taken (Cox(1992)). A policy is a manipulation that intervene in a certain variable and the effect that an identification strategy should aim to measure is the policy effect. Pearl’s approach consider that non-manipulable variables should be considered a causal variable itself. However, the approach that PO follows to tackle question involving non-manipulable variables such as gender or race is to measure the effect of a manipulation that is directly associated to these variables10.

10 Imbens(2019) usesBertrand e Mullainathan(2004) as an example of a causal question involving a

non-manipulable variable, race, using an identification strategy with a manipulable variable, perception of race, by randomly assigning the names of sent resumes in order to evaluate the impact of this perception in the labor market.

(33)

31

4 Difference in Differences

In an observational study, if the treatment assignment is confounded with the po-tential outcomes, it is possible to adjust for control variables under the Unconfoundedness Assumption and the requirement to use this strategy is to observe the control variables. However, in many cases the confounding variables are not observed, making the identifica-tion described in secidentifica-tion 2.2 impossible. The Difference in Differences (DiD) is a method that analyses treatment assignments that switch over time. The idea is that is possible to use pretreatment and post-treatment information, a comparison between treatment and control group and additional assumptions to identify causal effects even though the confounding effect is generated by unobserved variables (Angrist e Pischke (2008); Abadie e Cattaneo (2018)). The DiD method is used in the context of Natural Experiments, which is the design that the intervention is a naturally occurring phenomena. These phenomena are often associated to political and social events, not under the control of the researcher, which makes natural experiments observational studies. Here, the assignment is considered "as if" they were random, which enables the use of observational data to infer causation,

under certain assumptions (Dunning (2008)).

The unit-level causal effect, which is never observed, τi,t of the treatment is the difference between the two potential outcomes at the same time t:

τi,t = Yi,t(1) − Yi,t(0) (4.1)

Using the same terminology of Chapter 2Di is the binary treatment assignment, where d ∈ {0, 1} and, the outcome variable Yi is observed in T where, for simplicity,

t ∈ {0, 1}. Here, i ∈ {0, 1} describes the treatment and control groups, respectively,

instead of representing the unit-level. Evaluating the impact of an intervention means measuring the effect in Yi when Di is equal to 1. We denote the outcome variable as Yi,t, following the same structure described in the equation 2.1. The comparison involves four groups: the pretreatment treated group, which is the units that will be exposed to the treatment but are still unaffected, post-treatment treated group, pretreatment nontreated and post-treatment nontreated. Covariates also play an important role to control for observable confounding effects or to capture heterogeneity in causal effects, also a key feature in section 2.2. Here, the covariates are omitted, but an analysis considering them is straighforward. The causal estimand of interest is the Average Treatment Effect on Treated (Lechner et al. (2011)).

(34)

τAT ET = E[Y1,1(1) − Y1,1(0)] (4.2)

The challenge here is the fact that the, Y1,1(0) (treated unit without the treatment

in time t = 1) is not identified in t = 1. There are two naive approaches to obtain the causal effect in this empirical setting. The first approach is to compute a simple before-and-after difference. However, by doing so, the estimation will ignore unobserved confounding effects in time trends, generating a biased estimator.

τN aive1 = E[Y1,1(1)] − E[Y1,0(0)] (4.3)

Another naive approach is the simple comparison of control and treatment group. As said before, DiD is recommended when a RCT is not feasible and among the confounding variables there are unobserved quantities. So, there are differences between the groups before the intervention takes place that are not explained by observed variables, making the comparison of control and treated groups compute not only the causal effect, but also pretreatment differences, which also bias the estimation.

τN aive2 = E[Y1,1(1)] − E[Y0,1(0)] (4.4)

The difference in differences estimand is

τDiD = [E(Y1,1(1)) − E(Y1,0(0)] − [E(Y0,1(0)) − E(Y0,0(0)] (4.5)

To assure that τDiDis a causal estimand requires assumptions about the confounding variables. The three assumptions of Potential Outcomes framework are necessary for identification of causal effects in DiD: Unconfoundedness of Treatment Assignment (2) and Common Support Assumption (3) and SUTVA (1), which can be expressed as:

τ0,t = 0, t ∈ {0, 1} (4.6)

However, in this section will be required additional key assumptions. Following

Lechner et al. (2011) taxonomy:

Assumption 4 (No effect on the pretreatment population). There is no treatment effect

in the treated unit before the treatment occurs.

(35)

33

The Assumption (4) also means that there is no anticipation effects, which in some empirical settings it is not plausible, such as policies that have an announcement before the implementation. For example, policies that have been discussed in electoral campaigns could generate anticipation effects once the candidate becomes elected. So, to measure the causal effect of this policy, the researcher should take into account that defining the intervention time starting from the date of implementation of such policy may be misleading.

The key assumption1 of DiD is described here in two forms. Making the distinction of these forms is necessary to illustrate how the two naive estimators can lead the researcher to misunderstandings of the causal interpretation of certain intervention.

Assumption 5 (Common Trend). If the treated group had not been subjected to the

treatment, both treated and control groups would have experienced the same time trends conditional on covariates.

E[Y1,1(0)] − E[Y1,0(0)] = E[Y0,1(0)] − E[Y0,0(0)] (4.8)

Assuming that common trends are expected, if the observed outcome shows any deviation of this trend after the treatment, it can be acknowledged as a causal effect. The other form to explain the differences between the groups can be placed as a problem of bias estimation due to unobserved confounders. Assuming the Assumption (4) is true, the two groups are different conditional on covariates before the intervention because Di has an effect on Yi,0 through confounding variables. If the bias is constant, which means that the confounding effect does not change over time, the DiD estimand will have a causal interpretation.

Assumption 6 (Constant Bias). Differences between control and treatment group are

constant over time.

E[Y1,0(0)] − E[Y0,0(0)] = E[Y1,1(0)] − E[Y0,1(0)] (4.9)

Rearranging the terms of the equation above, it is possible to see that is Assumption (5) equivalent to Assumption (6)2

With the assumptions above the identification of τAT ET is straightforward:

1 Here, the covariates are ommited to make the explanation easier, but the results are kept in the

presence of them.

2 Other point of attention is the fact that the outcome is functional form dependent, i.e., any

transforma-tion in the outcome may affect the validity of assumptransforma-tions (6) and (5) (Lechner et al.(2011)). However,

Athey e Imbens(2006) developed an identification strategy that is invariant to transformations of the outcome.

(36)

τAT ET = E[Y1,1(1) − Y1,1(0)] = E[Y1,1(1)]

| {z }

observed

−E[Y1,1(0)] (4.10)

Using the Common Trends Assumption (5)

E[Y1,1(0)] = E[Y0,1(0)] + E[Y1,0(0)] − E[Y0,0(0)]

Then,

τAT ET = E[Y1,1(1)] − E[Y0,1(0)] − E[Y1,0(0)] + E[Y0,0(0)]

= [E(Y1,1(1)) − E(Y1,0(0))] − [E(Y0,1(0)) − E(Y0,0(0))]

= τDiD

A parametric version makes explicit that DiD is a version of fixed effects3estimation,

but instead of individual data, here it is used with aggregate outcome (Angrist e Pischke

(2008)).

Yi,t = ωi+ δt+ τ Di,t+ i,t (4.11) In potential outcomes notation,

Yi,t(0) = ωi+ δt+ i,t (4.12)

Yi,t(1) = ωi+ δt+ τ + i,t (4.13)

The Common Trends Assumption (5) and the Common Bias Assumption (6) are represented in the parametric framework, respectively in the following block of equations.

Taking the before and after difference,

3 A fixed effect model (FE model) relies on the fact that the confounding variables that are unobserved

are also time invariant individual characteristics. For the estimation of the FE model it is required panel data, i.e., repeated observations for the same individuals over time. The main difference of DiD and the fixed effect model is that instead of individual data, DiD uses, in general, aggregate variables (Angrist e Pischke(2008)). See Arellano(2003) for a deeper discussion on panel data fixed effects.

(37)

35

E[Y0,1(0)] − E[Y0,0(0)] = δ1− δ0 (4.14)

E[Y1,1(1)] − E[Y1,0(0)] = δ1− δ0+ τ (4.15)

And control and treatment group difference,

E[Y1,0(0)] − E[Y0,0(0)] = ω1− ω0 (4.16)

E[Y1,1(1)] − E[Y0,1(0)] = ω1− ω0+ τ (4.17)

where δt is time fixed effect, ωi is a time invariant group effect. The causal effect is represented by τ4 and i,t is a mean-zero random variable. Taking both differences Equation (4.14) - Equation (4.15) or Equation (4.16) - Equation (4.17) ends up with the same result,

causal effect of interest:

δ1− δ0− δ1 + δ0+ τ = ω1− ω0− ω1+ ω0 + τ = τDiD (4.18) The equation 4.11, can be estimated using a regression framework

Yi,t = α + ωI[i=1]+ δI[t=1]+ τ I[i=1]I[t=1] (4.19)

where, Di,t = I[i=1]I[t=1] is a random assignment dummy, α = δ0 + ω0 is an intercept,

ω = ω1− ω0 is the group fixed effect and δ = δ1− δ0 is the time effect.

The role of covariates in DiD is also encoded by the Unconfoundedness Assumption (2), which makes the toolbox of DAGs a natural candidate as covariate selection method.

It is important to assure that the variable added in the regression as a control is not a collider and conditioning on it blocks the path instead of opening.

There are still open issues in the method.Kahn-Lang e Lang (2020) discusses the role of parallel trends assumption and point attention to: i) the necessity of explaining why levels of control and treatment groups are different in the pretreatment period; ii) the choice of the functional form; iii) the fact that common pre-trends may be not necessary or sufficient to guarantee the common trends assumption. Rambachan e Roth(2019) provides a method that still estimates causal effects when common trends assumptions is violated.

For didactic purposes, considering a three periods setting it is easy to graphically understand how the common trends assumption works and how the identification of the causal effect is done. Here, δ = δ1− δ0.

4 Here, there is an implicit assumption of homogeneity of the effect. If the causal effect is heterogeneous,

(38)

t t = 0 t = 1 2 t = 1 Yi,t ω ω τi,1 δ δ α

(39)

37

5 Synthetic Control

The difference-in-differences method requires a control group to compare with the treatment group. However, when analyzing a single treated unit, a group whose average can be a good comparison or a unit that resembles the treated unit might not be available. Furthermore, it is not clear how to choose the appropriate units to form a control group and the researcher usually relies on the so-called Common Trends Assumption (5). The Synthetic

Control Method (SCM) was developed by Abadie and coauthors (Abadie e Gardeazabal

(2003), Abadie, Diamond e Hainmueller(2010), Abadie, Diamond e Hainmueller(2015)) to overcome these issues. It provides a data-driven procedure based on a nested optimization that chooses a convex combination of control units that best resembles the treated unit using not only pretreatment values of the outcomes but covariates. According to Athey e Imbens(2017b), SCM "... is arguably the most important innovation in the policy evaluation

literature in the last 15 years". It is a method that is widely applied in economics, social

sciences and comparative politics, mainly because it estimates the effects of a treatment in an aggregate level such as regions, countries or cities, which usually affects only one unit, making the usual DiD analysis more difficult. One clear advantage of the method, differently from regression analysis in the potential outcomes literature, is that it does not rely on extrapolation. Another advantage over DiD analysis is that the assumption that the confounding effects of unobserved variables are constant over time is not necessary. Even though, the idea that a combination of unaffected units provides a good approximation of the treated unit is simple, there are some recommendations that a researcher should follow. Its simplicity can be tempting and could lead to misuse of the method, that is why it is important to detail the conditions that the SCM should be recommended as an identification strategy to obtain causal estimates.

5.1

Basic Elements of SCM

Differently from the two-period setting of Chapter 4, suppose that we observe

i ∈ {1, ..., J + 1} units in periods t ∈ {1, ..., T }. The unit 1 is the treated unit and is

only exposed to the intervention after the T0 period, where 1 < T0 < T . So there are

T0 preintervention periods and T − T0 postintervention periods. The remaining J units

are not exposed to the treatment in any period and are called donor pool, which is the set of potential1 units that may compose the synthetic control2. For example,Abadie e

Gardeazabal(2003) investigates the economic impacts of political conflicts using the attacks

1 They are potential because the weight attributed to many of the units in the donor pool may be zero. 2 The assumptions of DiD and Potential Outcomes such as NEPT, unconfoundedness and SUTVA are

Imagem

Figure 1 – (a) Cyclic Graph (b) Acyclic Graph with a variable Y confounding the causal effect of X on Z
Figure 1: (a) Cyclic Graph (b) Acyclic Graph with a variable Y confounding the causal effect of X on Z.
Figure 3: (a) Chain: X 6⊥ ⊥ Z and X ⊥ ⊥ Z|Y ; (b) Fork: X 6⊥ ⊥ Z and X ⊥ ⊥ Z|Y ; (c) Col- Col-lider: X ⊥ ⊥ Z and X 6⊥ ⊥ Z|Y
Figure 5: Backdoor Criterion, example taken from Pearl (1995)
+3

Referências

Documentos relacionados

Das análises feitas dos questionários aplicados aos trabalhadores, aos utentes e os directores sobre os modelos de gestão utilizados pela Empresa Electra, para gerir

Neste trabalho o objetivo central foi a ampliação e adequação do procedimento e programa computacional baseado no programa comercial MSC.PATRAN, para a geração automática de modelos

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

The fourth generation of sinkholes is connected with the older Đulin ponor-Medvedica cave system and collects the water which appears deeper in the cave as permanent

This study was conducted to evaluate the ability of Pediococcus acidilactici HA-6111-2, a PA-1 bacteriocin- producing lactic acid bacterium (LAB), isolated from ‘‘Alheira” to inhibit

Interestingly, if we run bi-VAR in our data set using labor productivity and hours or relative weights and hours, the identified transitory shock becomes much less correlated

O soro dos animais vacinados com lipossomo, EBS e proteolipossomos foram coletados semanalmente antes e após a infecção experimental para a detecção da produção de anticorpos IgG,

In his last book, published in the year of his death (1935), Richet used, somewhat inconsistently, the words “chimera,” “dream” and “great hope,” as well as “my deep, almost