Quantifying the roles of uncertainty and preferences in understanding schooling choices and earnings inequality

(1)

Quantifying The Roles of Uncertainty and Preferences In Understanding Schooling Choices and Earnings Inequality

Flavio Cunha, James J. Heckman and Salvador Navarro

Escola de Pos-Graduacao em Economia Fundacao Getulio Vargas

(2)

Goal of this research

(1) For decades, economists and others have attempted to distinguish "luck" or chance from other factors in determining earnings.

(2) crude description of the decomposition: Look at Mincer earnings equation

(∗) ln_y_it = _x_it_β + _θ_i + _ε_it

yit =earnings of person i at time t x_it =observables

θi =person speciÞc Þxed effect εit = G(L)νt

ν_t iid mean zero

(3)

(7) Example: Macurdy, Mof_Þtt and Gottschalk, Abowd and Card all estimate earnings models in which versions of (*) are _Þt

(8) These statistical decompositions are interesting but tell us little about components unforecastable by agents

(9) Thus Gottschalk and Mof_Þtt show rising variances in (*) over the 1980s. But this says nothing about increasing uncertainty although some (e.g. Ljungquist and Sargent) interpret their evidence this way

(10) Goal of this paper is to identify these components. Need a theory and a link of (*) to behavioral theory. Extract forecastable from unforecastable components

(11) Precedent: work by Flavin. Using P|H. Innovations in consumption linked to innovations in income

(12) This paper uses schooling.

(a) What future components of income are forecastable at the date schooling decisions are being made?

(4)

(c) Can uncertainty and increasing uncertainty explain sluggish responses to the increase in the returns to schooling over the past 30 years?

(d) What role of uncertainty and what is the role of nonpecuniary factors?

De_Þnitions:

(a) Heterogeneity: all sources of variability among persons

(i) observable (ii) unobservable.

(b) Uncertainty: unforecastable components of hetero-geneity.

Goal: Estimate components in agent information sets at age t. See the evolution over time.

(5)

By-products: Unite and Extend Three Literatures

1. Treatment Effect and Policy Evaluation Literature (Derive Treatment Effects from Well Developed Economic Model)

2. Inequality Measurement Literature (Lift anonymity postu-lates)

(6)

1. Treatment Effect Literature

a. Traditional focus is on means (different means empha-sized in different sub_Þelds)

b. Our focus is on Distributions of Treatment Effects

c. We link the Treatment Effect Literature to the Choice-Theoretic Structural Econometric Literature using Robust Semiparametric Methods

d. We identify and estimate both “objective” outcomes of policies and “subjective” evaluations of policies as perceived by persons who choose (self select into) treatments.

(7)

2. Inequality Measurement Literature

a. Traditional Focus is on Overall Measures of Inequal-ity (e.g., Gini Coef_Þcients; Variance) and subgroup decompositions (emphasis on statistical accounting).

b. Link to Policy Analysis is Obscure. Decompositions do not correspond to well-de_Þned policies

c. Anonymity Postulate: Makes a virtue of a cross sectional necessity by comparing distributions using second order stochastic dominance

(8)

e. Our approach allows us to make such comparisons and informs us about which groups are affected and which are not and how policy shifts people within an aggregate distribution.

f. Allows us to examine how speci_Þc policies affect choices (e.g., schooling) and how this affects distributions and movements across distributions

(9)

3. Earnings Dynamics Literature.

(a) Does not correct for self selection into schooling.

(b)Silent on components of uncertainty.

(10)

Basic Framework used In Our Analysis: Roy (1951) economy

S = 1 college S = 0 high school Two potential outcomes

(Y0, Y1) (present value of earnings) Decision rule governing sectoral choices:

S =

(

1 if I = Y1 − Y0 − C ≥ 0, 0 otherwise.

(11)

Decompose Into Means: Y1 = µ1 + U1 E(U1) = 0 Y0 = µ0 + U0 E(U0) = 0

Implicitly condition on X variables

C = µ_C + U_C,

(12)

Two Kinds of Policies (a) Those that affect (Y0, Y1, C) (GE Policies) (b) Those that affect only C and hence S

(Treatment Effect Policies)

General Case

Under policy A : (Y₀A, Y₁A, CA)

(13)

Treatment Effect Case:

Policy A Policy B

(Y0, Y1, CA) and (Y0, Y1, CB)

(14)

Case I Y1 − Y0 is the same for everyone

Case II

Y1 − _Y0 differs among people of the same X but Pr(S = 1|Y1 − Y0) = Pr(S = 1)

Case III

Y1 − _Y0 differs among people and people act on these differences

M T E : E(Y1 − _Y0|I _{= 0)}

(15)

Why Useful to Estimate Joint Distributions of Counterfactuals?

1. The proportion of people who bene_Þt (in terms of gross gains) from participation in the program

P r(Y1 ≥ Y0 | X).

2. Gains to participants at selected levels of the no treatment state:

F(Y1 − Y0 | Y0 = y0, X) or treatment distribution

F(Y1 − _Y0 | _Y1 = _{y1, X})

3. The option value of social programs

(16)

E(Y_i − Y_j | X) for all i, j and E(Y_i − Y_j | X, D = 1) for all i, j

(17)

1 Estimating Distributions of

Coun-terfactual Outcomes

Two counterfactual states (Y0, Y1),

((Y0, Y1)⊥6⊥ _X), S = 1 if Y1 is observed ; S = 0 otherwise.

(so Y0 is observed)

Y = SY1 + (1 − S)Y0 Allow for the possibility of an instrument:

(Y0, Y1) ⊥⊥ _Z | _X

P r(S = 1 | _Z, _X) But not essential.

(18)

Data Available To Analyst:

Observe Y0 if S = 0 and Y1 if S = 1

but we do not observe (Y0, Y1) for anyone. F(Y0 | S = 0, X),

(19)

Three separate problems.

Selection Problem:

Know:

F(Y1 | S = 1, X) F(Y0 | S = 0, X)

If solved we get:

F(Y1 | X) F(Y0 | X)

not

(20)

Second Problem : Recovering Joint Distributions

Traditional approach:

conditional on X : Y1 and Y0 are deterministically related

Y1 ≡ _Y0 ₊ _∆₍_X₎ (1)

“Common Effect”

From means of

F(Y0 | S = 0, X) F(Y1 | S = 1, X)

(21)

Generalization

Quantiles perfectly ranked: Y1 = F₁−1_,X(F0,X(Y0))

F1_,X = F(y1 | X) F0_,X = F(y0 | X) Y1 = F₁−1_,X(1 − _F0_,X₍_Y0₎₎

(22)

To See Why Consider the Following Markov kernels

M(y1, y0 | X) and M˜ (y0, y1 | X) F(y1 | X) =

Z

M(y1, y0 | X)dF0(y0 | X) F(y0 | X) =

Z

˜

M(y0, y1 | X)dF1(y1 | X)

so

F(y1 | X) =

Z

M(y1, y0 | X) ˜M(y0, y1 | X)dF(y1 | X)

(23)

Approach Using Choice Theory

S = 1(µ_s(Z) ≥ e_s) , Z ⊥⊥ e_s (2)

Y1 = µ1(X) + U1, E(U1) = 0

Y0 = µ0(X) + U0, E(U0) = 0

(U1, U0) ⊥⊥ _{X, Z.}

where S = 1( Y1 > Y0) (Roy model).

(24)

For More General Rules method Breaks Down: Heckman (1990) and Heckman and Smith (1998):

(Z, X) ⊥⊥ ₍_{U0, U1, e}_s₎

(i) µ_s(Z) is a nontrivial function of Z conditional on X (ii) full support assumptions on µ1(X), µ0( X) and µ_s(Z) Identi_Þcation of F(U0, es), F(U1, es) and µ1(X), µ0(X) and µ(Z).

(25)

An alternative: Factor Models

Aakvik, Heckman and Vytlacil (1999, 2003):

U0 = α0θ + ε0, U1 = α1θ + ε1, es = αsθ + εs, θ ⊥⊥ (_{ε0, ε1, ε}_s)_{, ε0} ⊥⊥ _ε1 ⊥⊥ _ε_s_.

Can identify α0, α1, αs, F(U0, es) and F(U1, es)

COV (U0, es) = α0αs σ2_θ COV (U1, e_s) = α1α_sσ2_θ

(26)

Third Problem: What is in the Information Set of Agents at Date Schooling Decisions are being Made?

Let Ies be the information set at the date of schooling decisions. What is Pr(s = 1|Ies)?

Is Ise = ( Y1

(27)

2 Counterfactuals for the Multiple

Outcome Case

State s, age a for person ω ∈ _Ω :

Y_s,a(ω) s = 1, ...,S¯, a = 1, ..., A.¯ ¯

S states and A¯ ages

(3)

Ceteris paribus Effect (or individual treatment effect) s0 at age a00 is:

∆((s, a),(s0, a00)) = Ys,a(ω) − Ys0_,a00(ω). (4) Average Treatment Effect:

ATE((s, a),(s0, a), x) = E(Ys,a − Ys0_,a | X = x)

(28)

Lifetime utility: Vs(ω), s = 1, ...,S¯.

˜

s = argmax s

{Vs(ω)} ¯ S

s=1 . (5)

D_s = 1 if

_

S , ¯ S

P

s=1

(29)

Marginal treatment effect:

limMT E(a, V s,s0) = E(Y_s,a−Y

s0,a | Vs = Vs0 = V s,s0 ≥ Vj, j 6= s, s0)

(6)

s0 = 1, ..., S;¯ s0 6= s.

MT E_s(a,{V _s,s0}S_s₀

=1,s0₆₌_s) = (7)

¯

S

X

s0=1

s06=s

limMT E(a, V _s,s0)

³

f(V_s, V_s0 | V_s = V_s0 = V _s,s0 ≥ V_j, j 6= s, s0)/ψ(a, {V _s,s0}S_s₀

=1,s0₆₌_s)

´

ψ ³a, {V _s,s0}S

s0₌₁_,s0₆₌_s

´ =

S

X

s0=1

s06=s

(30)

3 Factor Structure Models

V_s = µ_s(Z) − e_s s = 1, ..,S.¯ (8)

example :

Vs = Z0β_s − es s = 1, ..,S.¯

es = α0_sθ + εs (9)

θ is a L × 1 vector (θ_. ⊥⊥θ_.0, . 6= .0)

θ⊥⊥_ε_s _s _{= 1}_{, ...,}_S¯_;

εs⊥⊥εs0 ∀s, s0 = 1, ..., S¯; s 6= s0

(31)

Potential outcomes Y_s,a∗ :

Y_s,a∗ = µ_s,a(X) + α0_s,aθ + εs,a (11)

E(ε_s,a) = 0

θ⊥⊥ε_s,a; s = 1, ...,S¯; a = 1, ...,A,¯ (12)

ε_s,a⊥⊥ε_s0_,a00; ∀ s 6= s0, ∀ a, a00. (13)

εs,a⊥⊥εs0,

∀ s0, s = 1, ...,S¯; a = 1, ..., A.¯

(32)

X_s,a⊥⊥₍θ, ε_s0_,a00),

∀ s, a, s0, a00, s = 1, ...,S¯; a = 1, ...,A.¯

(15)

Y_s,a∗ vector valued

Y_s,a∗ in (11) as a latent variable

(33)

Relationship To Matching

Matching

Y_s,a⊥⊥D_s | X = x, Z = z, Θ ₌ _θ _{for all} _s

V_s⊥⊥D_s | X = x, Z = z, Θ ₌ _θ _{for all} _s.

ATE:

E(Ys,a − Ys0_,a | X, θ) =

(34)

Measurement System

M1 = µ1(x) + β11θ1 + .... + β₁_Kθ_K + ε1

... (16)

ML = µ_L(x) + β_L1θ1 + .... + β_LKθ_K + ε_L

ε_M = (ε1, ..., ε_L), E(ε_M) = 0 (17)

θ = (θ1, .., θ_K) ⊥⊥₍ε1, ..., ε_L), θ ⊥⊥ε_s,

θ ⊥⊥_ε_s,a_{, ε}_s,a ⊥⊥ε_M ⊥⊥_ε_s_{, ε}_i ⊥⊥_ε_j_,

(35)

(36)

Choice Equations

V (s) = µ_s(Z, X) + γ0_sθ + ε_s (18) V = {V (s)}S_s¯=1

(37)

4 Identi

_Þ

cation of Semi-parametric

Factor Models

G = µ + Λ_θ ₊ _ε ₍₁₉₎

G: L × 1, θ ⊥⊥ ε, µ: L × 1 , θ: K × 1,

ε : L × 1 and Λ_: _L _× _K_. _ε_i _⊥_⊥ _ε_j _i ₆₌ _{j, i, j} _{= 1}_{, ..., K}

Even if θi ⊥⊥ θj, the model is underidentiÞed

COV (G) = ΛΣ_θΛ0 ₊ _D_ε ₍₂₀₎

(38)

We require that

L(L − ₁₎

2

| {z }

Number of off-diagonal covariance elements

≥ ₍L × K − K)

| {z }

Number of unrestricted _Λ

+ _|{z}K Variances ofθ

A necessary condition:

(39)

Λ ₌                

1 0 0 0 ... ... ... 0

λ21 0 0 0 ... ... ... 0 λ31 1 0 0 ... ... ... 0 λ41 λ42 0 0 ... ... ... 0

λ51 λ52 1 0 ... ... ... 0 λ61 λ62 λ63 0 ... ... ... 0 λ71 λ72 λ73 1 ... 0 ... 0

λ81 λ82 λ83 λ84 ... 0 ... 0 ... ... ... ... ... ... ... ... λ_L,1 λ_L,2 λ_L,3 ... ... ... ... λ_L,K

(40)

COV (g_j, g_l) = λ_j1λ_l1σ 2

θ₁, l = 1, 2; j = 1, ..., L; j 6= l.

In particular

COV (g1, g_.) = λ_.1σ 2

θ₁

COV (g2, g_.) = λ_.1λ21σ 2

θ₁.

Assuming λ.1 6= 0, we obtain

COV (g2, g_.)

COV (g1, g_.) = λ21.

Hence, from COV (g1, g2) = λ21σ 2

θ₁, we obtain σ

2

θ₁, and hence

λ_.1, . = 1, . . . , L. We can proceed to the next set of two measurements and identify

COV (g_l, g_j) = λ_l1λ_j1σ 2

θ₁+λl2λj2σ

2

(41)

Our System:

Gs = (M, Ys, Ds), Gs are of two types: (a) continuous

vari-ables and (b) discrete or censored random varivari-ables, including binary strings.

Mc,Y_sc,Md,Y_sd :

Table 1: Components of G_s

Continuous Discrete Variables de_Þned for all s Mc Md Variables de_Þned for s Y_sc Y_sd

Indicator of state − Ds

Continuous variables:

Mc = µc_m (X) + U_mc Y_sc = µc_s (X) + U_sc. Discrete variables:

(42)

Data used for factor analysis :

Mc, M∗d, Y_sc, Y_s∗d and V (s), s = 1, ..., S.

Y =

S

P

s=1

D_sY_s.

M, Y, and D_s, s = 1, .., S all contain information on θ.

(43)

(A-1) ¡U_mc , U_md, U_sc, U_sd, ε_W¢ have distribution functions that are absolutely continuous with respect to Lebesgue measure with means zero with support Uc

m × Udm × Ucs × Uds × EW

with upper and lower limits being U¯_mc ,U¯_md, U¯_sc,U¯_sd, ε_W

and Uc_m, Ud_m, Uc_s, Ud_s, ε_W, respectively, which may be bounded or inÞnite. Thus the joint system is measurably separable (variation free). We assume Þnite variances. The cumulative distribution function of εW is assumed to be

strictly increasing over its full support (ε_W, ε_W).

(A-2) (X, Z, Q) ⊥⊥ ₍U, ε_W) where U = (U_mc , U_md, U_sc, U_sd), where

(44)

Theorem 1 From data on F (M | X), one can identify the joint distribution of ¡U_mc , U_md¢ (the latter component only up to scale), the function µd_m(X) is identiÞed and µc_m(X) is iden-tiÞed over the support of X (up to scale) provided that the fol-lowing assumptions, in addition to the relevant components of (A-1) and (A-2), are invoked.

(A-3) Order the discrete measurement components to be

Þrst. Suppose that there are Nm,d discrete

compo-nents, followed by N_m,c continuous components.

As-sume Support

³

µd1_,m (X), . . . , µd_N_m,d_,m (X) ´

⊇ Support

¡

U1d_,m, ..., U_Nd_m_,m ¢

.

(A-4) For each l = 1, ..., N_m,d µd_l,m (X) = Xβd_l,m.

(A-5) The X lives in a subset of RNX. There exists no linear

proper subspace of RNX _{having probability} 1 _under F

X,

(45)

Proof of Theorem 1: The case where M consists of purely continuous components is trivial. We observe Mc for each X and can recover the marginal distribution for each component. Recall that M is not state dependent.

For the purely discrete case, we encounter the usual problem that there is no direct observable counterpart for µd_m (X). Under (A-1)-(A-5), we can use the analysis of Manski (1988) to identify the slope coef_Þcients βd_l,m up to scale, and the marginal distribution of U_l,md . From the assumption that the mean (or median) of U_l,md is zero, we can identify the intercept in βd_l,m. We can repeat this for all discrete components. Thus coordinate by coordinate we can identify the marginal distributions of U_mc , Ue_md, µc_m (X) and _eµd_m(X), the latter up to scale (“~” means identi_Þed up to scale).

To recover the joint distribution write:

Pr (Mc ≤ mc, Md = (0, ...,0) | X)

= F_Uc m,Uemd

³

m_c − µc_m(X), −_eµd_m(X)´

by assumption (A-2). To identify F_Uc

m,Uemd (t

1, t2) for any given evaluation points in the support of (U_mc ,Ue_md), we know the function µ_ed_m (X) and using (A-3) we can _Þnd an X where e

(46)

proof, t1, t2 may be vectors. Thus

Pr (M_c ≤ m_c, M_d = (0, ...,0) | X = x_b) = F_Uc

m,Uemd (mc

− µc_m(x_b) , t2)

Let m_bc = t1 − µc_m(xb) to obtain

Pr (Mc ≤ mbc, Md = (0, ..., 0) | X = xb) = F_Uc

m,Uemd (t1, t2)

We know the left hand side and thus identify F_Uc

m,Uemd at the

(47)

Theorem 2 For the relevant subsets of the conditions (A-1), and (A-2) (speciÞcally, assuming absolute continuity of the dis-tribution of ε_W with respect to Lebesgue measure and ε_W ⊥⊥

(Z, Q)), and the additional assumptions:

(A-6) c_s(Q_s) = Q_sη_s, s = 1, ..., S, ϕ(Z) = Z0β

(A-7) (Q1, Z) is full rank (there is no proper subspace of the

support (Q1, Z) with probability 1). The Z contains no intercept.

(A-8) Qs for s = 2, . . . , S is full rank (there is no proper subspace

of ¡RQs¢ with probability 1).

(A-9) Support (c(Q1) − ϕ(Z)) ⊇ Support (ε_W)

Then the distribution function F_ε_W is known up to a scale normalization on εW and cs(Qs), s = 1, ...s,¯ and ϕ(Z) are

(48)

Proof of Theorem 2_:

Pr (D₁ = 1 | Z, Q₁) = Pr

µ

c₁ (Q₁) − _ϕ(_Z)

σW

> εW σW

¶

Under (A-1), (A-2), (A-6), (A-7) and (A-9), it follows that

c1(Q1)−ϕ(Z)

σW and F˜εW (where eεW = εW

σW) are identiÞed (see

Manski, 1988 or Matzkin 1992, 1993). Under rank condition (A-7), identi_Þcation of c1(Q1)−ϕ(Z)

σW implies identiÞcation of c1(Q1)

σW and

ϕ(Z)

σW separately. Write

Pr (D₂ = 1 | Z, Q₁_,Q₂) = F_e_ε_W µ

c₂(Q₂) − _ϕ(_Z)

σ_W

¶

−_F_e_ε

W

µ

c₁ (Q₁) − _ϕ(_Z)

σ_W

¶ .

From the absolute continuity of _eεW and the assumption that

the distribution function of _eεW is strictly increasing, we can

write c₂ (Q₂)

σ_W = F

−1

eεW

·

Pr (D₂ = 1 | Z, Q₁, Q₂) + F_e_ε_W µ

c₁(Q₁) − _ϕ(_Z)

σ_W

¶¸

+ϕ(Z)

σ_W . Thus we can identify c2(Q2)

(49)

Pr ¡Mc ≤ mc, M∗d ≤ 0, Y_sc ≤ y_sc, Y_s∗d ≤ 0|D_s = 1, X, Z, Q_s, Q_s₋₁¢

×Pr(D_s = 1|Z, Q_s, Q_s−₁) (22)

=

Z mc−µc_m(X)

U_c

Z −_eµd_m(X)

e

U d

m

Z y_sc−µc_s(X)

Uc_s

Z −_eµd(X)

e

U d

Z cs(Qs)−ϕ(Z)

σW

cs−1(Qs−1)−ϕ(Z)

σW

f ³U_m,c Ue_m,d U_s,cUe_s,d_eεW

´

dU_mc dUe_mddU_scdUe_sdd_eεW.

(A-10)

Support µ

−e_µd_m(_X)_,−_µed_s(_X)_,

µ

cs (Qs) − ϕ(Z)

σW

− cs−1(Qs−1) − ϕ(Z)

σW

¶¶

⊇ _Support(_U_md_{, U}_sd_,e_ε_W) = (Ud

m × U d

s × EeW).

(A-11) There is no proper linear subspace of (X, Z, Qs, Qs−1) with probability one so the model is full rank.

As a consequence of (A-6) and (A-10) we can _Þnd values of Q_s, Q_s−₁, Q¯s, Q_s−₁ respectively so that

lim

Qs→Q¯s Qs−1→Q_s−1

(50)

Theorem 3 Under assumptions 1), 2), 4), 6), (A-7),(A-8),(A-9),(A-10) and (A-11), µc_m(X), µc_s(X), _eµd_m(X),

e

µd_s(X),ϕ(Z_e ), cs(Qs) s = 1, ..., S − 1 are identiÞed as is the

(51)

Proof of Theorem 3: From (A-2), the unobservables are jointly independent of (X, Z, Q). For _Þxed values of

(Z, Qs, Qs−₁), we may vary the points of evaluation for

the continuous coordinates (y_sc) and pick alternative val-ues of X = x_b to trace out the vector µc(X) up to intercept terms. Thus we can identify µc_s,l(X) up to a constant for all

l = 1, ..., Nc,s.(Heckman and Honoré, 1990). Under (A-2), we recover the same functions for whatever values of Z, Qs, Qs−₁

are prespeci_Þed as long as c_s (Qs) > cs−₁(Qs−₁), so that there

is interval of εW bounded above and below with positive prob-ability. This identi_Þcation result does not require any passage to a limit argument.

For values of (Z, Qs, Qs−₁) such that lim

Qs→Q¯s(Z)

Qs−1→Q_s

−1(Z)

Pr (D_s= 1|Z, Q_s, Q_s₋₁) = 1.

where Q¯_s (Z) is an upper limit and Q_s−₁(Z) is a lower

limit, and we allow the limits to depend on Z, we essentially integrate out _eεW and obtain

Pr(Mc ≤ mc,µ_e_md ≤ −U_md, U_sc ≤ y_sc −µc(X), Ue_sd ≤ −µed_s(X))

(52)

Then proceeding as in the proof of Theorem 1, we can identify µ_ed_s(X) coordinate by coordinate and we obtain the constants in µc_s,l(X), l = 1, ..., Nc,s as well as the constants in

e

µd(X). From the assumption of mean or median zero of the unobservables. In this exercise, we use the full rank condition on X which is part of assumption (A-11).

With these functions in hand, under the full conditions of assumption (A-10) we can _Þx y_sc, y_mc ,µ_ed_s, _eµd_m, cs(Qs)_σ−ϕ(Z)

W ,

cs−1(Qs−1)−ϕ(Z)

σW at different values to trace out the joint

(53)

S distributions: we factor analyze them.

Us = α0_sθ + εs s = 1, ..., S

Um = α0_mθm + εm

m = 1, ..., Nm

(54)

Theorem 4 Under the normalizations on the factor loadings

of the type in (21) for one system s under the conditions of Theorems 1-3, given the normalizations for the

unobserv-ables for the discrete components and given at least 2K + 1

measurements (Y, M, V ), the unrestricted factor loadings

and the variances of the factors (σ2_θ

i, i = 1, ..., K) are

iden-ti_Þed over all systems.

(55)

Nonparametric Identi_Þcation of Distributions

({U_m}Nm

m=1,{Us,a}_aA₌₁,eεV ) = Ts

First B₁ elements depend only on θ₁; next B₂ − B₁ elements depend on (θ₁, θ₂) and so forth.

(56)

Theorem 5 If

T₁s = θ₁ + v₁ and

T₂s = θ₁ + v₂

and θ₁ ⊥⊥_v₁ ⊥⊥_v₂_, the means of all three generating random variables are Þnite, E(v₁) = E(v₂) = 0, and the conditions of Fubini’s theorem are satisÞed for each random variable, and the random variables possess nonvanishing (a.e.) characteris-tic functions, then the densities of(θ₁, v₁, v₂), g(θ₁), g₁(v₁), g₂(v₂), respectively, are identiÞed.

(57)

T₁s = λs₁₁θ₁ + εs₁ where λs₁₁ = 1 T₂s = λs₂₁θ₁ + εs₂ where λs₂₁ 6= 0.

T₁s = θ₁ + εs₁ T₂s

λs₂₁ = θ1 + ε ∗,s

2 .

Proceeding to equations B₁ + 1 and B₁ + 2

T_Bs₁₊₁ = λs_B₁₊₁_,₁θ₁ + θ₂ + εs_B₁₊₁

T_Bs₁₊₂ = λs_B₁₊₂_,₁θ₁ + λs_B₁₊₂_,₂θ₂ + εs_B₁₊₂.

T_Bs₊₁ − λs_B₁₊₁_,₁θ₁ = θ₂ + εs_B₁₊₁ T_Bs₊₂ − λs_B₁₊₂_,₁θ₁

λs_B₁₊₂_,₂ = θ2 + ε ∗,s B1+2

ε∗_B,s₁₊₂ = ε s B₁₊₂

(58)

Example:

Two potential outcomes at a : (Y0, Y1). Set Z big, so D_s = 1.

Y0_a = α0_aθ + ε0_a, a = 1, ..., A Y1_a = α1_aθ + ε1_a, a = 1, ..., A

(ε0_a ⊥⊥ ε1_a0) ∀_a,a0

(ε0_a, ε1_a) ⊥⊥ θ ∀a

Suppose that Pr( D_s = 1|Z) = 1. for Z ∈ _Z₁ _and Pr( Ds = 0|Z) = 0. , for Z ∈ Z0.

Y0_a = µ0_a + α0_aθ + ε0_a, a = 1, . . . , A Y1_a = µ1_a + α1_aθ + ε1_a, a = 1, . . . , A,

(59)

A = 3, so if we have three panel observations, we obtain

COV (Y₀₁, Y₀₂) = α₀₁α₀₂σ2_θ,

COV (Y₀₁, Y₀₃) = α₀₁α₀₃σ2_θ,for Z ∈ Z₀,

COV (Y₀₂, Y₀₃) = α₀₂α₀₃σ2_θ,

COV (Y₁₁, Y₁₂) = α₁₁α₁₂σ2_θ,

COV (Y₁₁, Y₁₃) = α₁₁α₁₃σ2_θ, for Z ∈ Z₁

COV (Y₁₂, Y₁₃) = a₁₂a₁₃σ2_θ.

Assuming α₀₁ = 1 or σ2_θ = 1, α₀₃ 6= 0

COV (Y₀₁, Y₀₂)

COV (Y₀₁, Y₀₃) =

α₀₂ α₀₃

Given σ2_θ = 1 and COV (Y₀₂, Y₀₃) = α₀₂α₀₃, we obtain

(α₀₃)2 = COV (Y02, Y03)COV (Y01, Y03)

(60)

The variances of the uniquenesses:

V ar(ε₀_a) = V ar(Y₀_a) − α2_0aσ2_θ a = 1, . . . , A

V ar(ε_1a) = V ar(Y_1a) − _α2

1aσ2θ a = 1, . . . , A.

COV (T, Y_0a) = βα_0aσ2_θ a = 1, ..., A COV (T, Y_1a0) = βα_1a0σ2

θ a

0 _{= 1}_{, ..., A}

Assume β 6= 0 and α_0a, α_1a0 6= 0 :

COV(T,Y1a0)

COV(T,Y0a) =

α1a0

α0a

a = 1, ..., A, a0 = 1, ..., A and a 6= a0

(61)

Factor Representation of Stationary AR(1) Processes

Theorem _Let yt

t1

T _{be an stationary AR(1) process with T odd integer}

observations where||  1andt  iid0,2__. _{Then we can always find}_M _ T1 2

orthogonal factors, corresponding loadings (which may be normalized to zero or one) and uniquenessesut  iid0,st2such thatyt  yt1 t can be represented by yt  1t_1__{. . .}__Mt _M __ut_.

Proof _{By induction. The theorem obviously holds for T} _{1 observations of}

the AR(1) process. Assume it holds for any odd integer T number of observations. We have to show that it also holds for T2. Indeed, let

yt  1t1 . . .Mt M ut,t  1, . . . ,T

be the factor representation of an stationary AR(1) processyt_tT_₁. If we add two more observations, one cannot representyt_tT_₁2 with the originalM factors, because the extra number of parameters in the correlation matrix to be reproduced far exceeds the number of loadings that we get from the two extra observations. Thus, we need more factors. We show that only one more factor is necessary to represent the stationary AR(1) processyt_tT_₁2 after the addition of the two extra observations. Indeed, let

yt  1t1 . . .Mt Mut,t  1, . . . ,T1

yt  1t1 . . .Mt M tM1M1 ut,t  T,T1,T2

be its factor representation. Notice that

Corryt,yT1  

i1

M it_

i

T1_i2_, _t _ _{1, . . . ,}_T_₁

Corryt,yT2  

i1

M it_

i

T2_i2_, _t _ _{1, . . . ,}_T_₁

But this is a system of 2T1 with2M  T1 unknowns. Thus, we can solve the system above to recoveriT1_,_iT2_

i1

M

. Now,

CorryT,yT1  

i1

M

iT_iT1_i2 __

M1

T _M

1

T1_M

1 2 _ _

CorryT,yT2  

i1

M

iT_iT2_i2 __

M1

T _M

1

T2_M

1 2 _ _2

CorryT1,yT2  

i1

M

iT1_iT2_i2 __

M1

T1_M

1

T2_M

(62)

and defineA1  _iM_₁iTiT1i2 ,A2  _iM_₁TiiT2i2 andA3  M_i_₁iT1iT2i2 .

Then, the following system must be solved forMT_1₁_,_M

1

T2_{, and}_M

1 2 _:

MT11M21  A1

MT21M21  A2

MT11TM21M21  A3

and the solution is:

MT11 

A3

A2

MT21  

A3

A1

M21 

A2A1

A3