Quantifying The Roles of Uncertainty and Preferences In Understanding Schooling Choices and Earnings Inequality
Flavio Cunha, James J. Heckman and Salvador Navarro
Escola de Pos-Graduacao em Economia Fundacao Getulio Vargas
Goal of this research
(1) For decades, economists and others have attempted to distinguish "luck" or chance from other factors in determining earnings.
(2) crude description of the decomposition: Look at Mincer earnings equation
(∗) lnyit = xitβ + θi + εit
yit =earnings of person i at time t xit =observables
θi =person speciÞc Þxed effect εit = G(L)νt
νt iid mean zero
(7) Example: Macurdy, MofÞtt and Gottschalk, Abowd and Card all estimate earnings models in which versions of (*) are Þt
(8) These statistical decompositions are interesting but tell us little about components unforecastable by agents
(9) Thus Gottschalk and MofÞtt show rising variances in (*) over the 1980s. But this says nothing about increasing uncertainty although some (e.g. Ljungquist and Sargent) interpret their evidence this way
(10) Goal of this paper is to identify these components. Need a theory and a link of (*) to behavioral theory. Extract forecastable from unforecastable components
(11) Precedent: work by Flavin. Using P|H. Innovations in consumption linked to innovations in income
(12) This paper uses schooling.
(a) What future components of income are forecastable at the date schooling decisions are being made?
(c) Can uncertainty and increasing uncertainty explain sluggish responses to the increase in the returns to schooling over the past 30 years?
(d) What role of uncertainty and what is the role of nonpecuniary factors?
DeÞnitions:
(a) Heterogeneity: all sources of variability among persons
(i) observable (ii) unobservable.
(b) Uncertainty: unforecastable components of hetero-geneity.
Goal: Estimate components in agent information sets at age t. See the evolution over time.
By-products: Unite and Extend Three Literatures
1. Treatment Effect and Policy Evaluation Literature (Derive Treatment Effects from Well Developed Economic Model)
2. Inequality Measurement Literature (Lift anonymity postu-lates)
1. Treatment Effect Literature
a. Traditional focus is on means (different means empha-sized in different subÞelds)
b. Our focus is on Distributions of Treatment Effects
c. We link the Treatment Effect Literature to the Choice-Theoretic Structural Econometric Literature using Robust Semiparametric Methods
d. We identify and estimate both “objective” outcomes of policies and “subjective” evaluations of policies as perceived by persons who choose (self select into) treatments.
2. Inequality Measurement Literature
a. Traditional Focus is on Overall Measures of Inequal-ity (e.g., Gini CoefÞcients; Variance) and subgroup decompositions (emphasis on statistical accounting).
b. Link to Policy Analysis is Obscure. Decompositions do not correspond to well-deÞned policies
c. Anonymity Postulate: Makes a virtue of a cross sectional necessity by comparing distributions using second order stochastic dominance
e. Our approach allows us to make such comparisons and informs us about which groups are affected and which are not and how policy shifts people within an aggregate distribution.
f. Allows us to examine how speciÞc policies affect choices (e.g., schooling) and how this affects distributions and movements across distributions
3. Earnings Dynamics Literature.
(a) Does not correct for self selection into schooling.
(b)Silent on components of uncertainty.
Basic Framework used In Our Analysis: Roy (1951) economy
S = 1 college S = 0 high school Two potential outcomes
(Y0, Y1) (present value of earnings) Decision rule governing sectoral choices:
S =
(
1 if I = Y1 − Y0 − C ≥ 0, 0 otherwise.
Decompose Into Means: Y1 = µ1 + U1 E(U1) = 0 Y0 = µ0 + U0 E(U0) = 0
Implicitly condition on X variables
C = µC + UC,
Two Kinds of Policies (a) Those that affect (Y0, Y1, C) (GE Policies) (b) Those that affect only C and hence S
(Treatment Effect Policies)
General Case
Under policy A : (Y0A, Y1A, CA)
Treatment Effect Case:
Policy A Policy B
(Y0, Y1, CA) and (Y0, Y1, CB)
Case I Y1 − Y0 is the same for everyone
Case II
Y1 − Y0 differs among people of the same X but Pr(S = 1|Y1 − Y0) = Pr(S = 1)
Case III
Y1 − Y0 differs among people and people act on these differences
M T E : E(Y1 − Y0|I = 0)
Why Useful to Estimate Joint Distributions of Counterfactuals?
1. The proportion of people who beneÞt (in terms of gross gains) from participation in the program
P r(Y1 ≥ Y0 | X).
2. Gains to participants at selected levels of the no treatment state:
F(Y1 − Y0 | Y0 = y0, X) or treatment distribution
F(Y1 − Y0 | Y1 = y1, X)
3. The option value of social programs
E(Yi − Yj | X) for all i, j and E(Yi − Yj | X, D = 1) for all i, j
1 Estimating Distributions of
Coun-terfactual Outcomes
Two counterfactual states (Y0, Y1),
((Y0, Y1)⊥6⊥ X), S = 1 if Y1 is observed ; S = 0 otherwise.
(so Y0 is observed)
Y = SY1 + (1 − S)Y0 Allow for the possibility of an instrument:
(Y0, Y1) ⊥⊥ Z | X
P r(S = 1 | Z, X) But not essential.
Data Available To Analyst:
Observe Y0 if S = 0 and Y1 if S = 1
but we do not observe (Y0, Y1) for anyone. F(Y0 | S = 0, X),
Three separate problems.
Selection Problem:
Know:
F(Y1 | S = 1, X) F(Y0 | S = 0, X)
If solved we get:
F(Y1 | X) F(Y0 | X)
not
Second Problem : Recovering Joint Distributions
Traditional approach:
conditional on X : Y1 and Y0 are deterministically related
Y1 ≡ Y0 + ∆(X) (1)
“Common Effect”
From means of
F(Y0 | S = 0, X) F(Y1 | S = 1, X)
Generalization
Quantiles perfectly ranked: Y1 = F1−1,X(F0,X(Y0))
F1,X = F(y1 | X) F0,X = F(y0 | X) Y1 = F1−1,X(1 − F0,X(Y0))
To See Why Consider the Following Markov kernels
M(y1, y0 | X) and M˜ (y0, y1 | X) F(y1 | X) =
Z
M(y1, y0 | X)dF0(y0 | X) F(y0 | X) =
Z
˜
M(y0, y1 | X)dF1(y1 | X)
so
F(y1 | X) =
Z
M(y1, y0 | X) ˜M(y0, y1 | X)dF(y1 | X)
Approach Using Choice Theory
S = 1(µs(Z) ≥ es) , Z ⊥⊥ es (2)
Y1 = µ1(X) + U1, E(U1) = 0
Y0 = µ0(X) + U0, E(U0) = 0
(U1, U0) ⊥⊥ X, Z.
where S = 1( Y1 > Y0) (Roy model).
For More General Rules method Breaks Down: Heckman (1990) and Heckman and Smith (1998):
(Z, X) ⊥⊥ (U0, U1, es)
(i) µs(Z) is a nontrivial function of Z conditional on X (ii) full support assumptions on µ1(X), µ0( X) and µs(Z) IdentiÞcation of F(U0, es), F(U1, es) and µ1(X), µ0(X) and µ(Z).
An alternative: Factor Models
Aakvik, Heckman and Vytlacil (1999, 2003):
U0 = α0θ + ε0, U1 = α1θ + ε1, es = αsθ + εs, θ ⊥⊥ (ε0, ε1, εs), ε0 ⊥⊥ ε1 ⊥⊥ εs.
Can identify α0, α1, αs, F(U0, es) and F(U1, es)
COV (U0, es) = α0αs σ2θ COV (U1, es) = α1αsσ2θ
Third Problem: What is in the Information Set of Agents at Date Schooling Decisions are being Made?
Let Ies be the information set at the date of schooling decisions. What is Pr(s = 1|Ies)?
Is Ise = ( Y1
2 Counterfactuals for the Multiple
Outcome Case
State s, age a for person ω ∈ Ω :
Ys,a(ω) s = 1, ...,S¯, a = 1, ..., A.¯ ¯
S states and A¯ ages
(3)
Ceteris paribus Effect (or individual treatment effect) s0 at age a00 is:
∆((s, a),(s0, a00)) = Ys,a(ω) − Ys0,a00(ω). (4) Average Treatment Effect:
ATE((s, a),(s0, a), x) = E(Ys,a − Ys0,a | X = x)
Lifetime utility: Vs(ω), s = 1, ...,S¯.
˜
s = argmax s
{Vs(ω)} ¯ S
s=1 . (5)
Ds = 1 if
_
S , ¯ S
P
s=1
Marginal treatment effect:
limMT E(a, V s,s0) = E(Ys,a−Y
s0,a | Vs = Vs0 = V s,s0 ≥ Vj, j 6= s, s0)
(6)
s0 = 1, ..., S;¯ s0 6= s.
MT Es(a,{V s,s0}Ss0
=1,s06=s) = (7)
¯
S
X
s0=1
s06=s
limMT E(a, V s,s0)
³
f(Vs, Vs0 | Vs = Vs0 = V s,s0 ≥ Vj, j 6= s, s0)/ψ(a, {V s,s0}Ss0
=1,s06=s)
´
ψ ³a, {V s,s0}S
s0=1,s06=s
´ =
S
X
s0=1
s06=s
3 Factor Structure Models
Vs = µs(Z) − es s = 1, ..,S.¯ (8)
example :
Vs = Z0βs − es s = 1, ..,S.¯
es = α0sθ + εs (9)
θ is a L × 1 vector (θ. ⊥⊥θ.0, . 6= .0)
θ⊥⊥εs s = 1, ...,S¯;
εs⊥⊥εs0 ∀s, s0 = 1, ..., S¯; s 6= s0
Potential outcomes Ys,a∗ :
Ys,a∗ = µs,a(X) + α0s,aθ + εs,a (11)
E(εs,a) = 0
θ⊥⊥εs,a; s = 1, ...,S¯; a = 1, ...,A,¯ (12)
εs,a⊥⊥εs0,a00; ∀ s 6= s0, ∀ a, a00. (13)
εs,a⊥⊥εs0,
∀ s0, s = 1, ...,S¯; a = 1, ..., A.¯
Xs,a⊥⊥(θ, εs0,a00),
∀ s, a, s0, a00, s = 1, ...,S¯; a = 1, ...,A.¯
(15)
Ys,a∗ vector valued
Ys,a∗ in (11) as a latent variable
Relationship To Matching
Matching
Ys,a⊥⊥Ds | X = x, Z = z, Θ = θ for all s
Vs⊥⊥Ds | X = x, Z = z, Θ = θ for all s.
ATE:
E(Ys,a − Ys0,a | X, θ) =
Measurement System
M1 = µ1(x) + β11θ1 + .... + β1KθK + ε1
... (16)
ML = µL(x) + βL1θ1 + .... + βLKθK + εL
εM = (ε1, ..., εL), E(εM) = 0 (17)
θ = (θ1, .., θK) ⊥⊥(ε1, ..., εL), θ ⊥⊥εs,
θ ⊥⊥εs,a, εs,a ⊥⊥εM ⊥⊥εs, εi ⊥⊥εj,
Choice Equations
V (s) = µs(Z, X) + γ0sθ + εs (18) V = {V (s)}Ss¯=1
4 Identi
Þ
cation of Semi-parametric
Factor Models
G = µ + Λθ + ε (19)
G: L × 1, θ ⊥⊥ ε, µ: L × 1 , θ: K × 1,
ε : L × 1 and Λ: L × K. εi ⊥⊥ εj i 6= j, i, j = 1, ..., K
Even if θi ⊥⊥ θj, the model is underidentiÞed
COV (G) = ΛΣθΛ0 + Dε (20)
We require that
L(L − 1)
2
| {z }
Number of off-diagonal covariance elements
≥ (L × K − K)
| {z }
Number of unrestricted Λ
+ |{z}K Variances ofθ
A necessary condition:
Λ =
1 0 0 0 ... ... ... 0
λ21 0 0 0 ... ... ... 0 λ31 1 0 0 ... ... ... 0 λ41 λ42 0 0 ... ... ... 0
λ51 λ52 1 0 ... ... ... 0 λ61 λ62 λ63 0 ... ... ... 0 λ71 λ72 λ73 1 ... 0 ... 0
λ81 λ82 λ83 λ84 ... 0 ... 0 ... ... ... ... ... ... ... ... λL,1 λL,2 λL,3 ... ... ... ... λL,K
COV (gj, gl) = λj1λl1σ 2
θ1, l = 1, 2; j = 1, ..., L; j 6= l.
In particular
COV (g1, g.) = λ.1σ 2
θ1
COV (g2, g.) = λ.1λ21σ 2
θ1.
Assuming λ.1 6= 0, we obtain
COV (g2, g.)
COV (g1, g.) = λ21.
Hence, from COV (g1, g2) = λ21σ 2
θ1, we obtain σ
2
θ1, and hence
λ.1, . = 1, . . . , L. We can proceed to the next set of two measurements and identify
COV (gl, gj) = λl1λj1σ 2
θ1+λl2λj2σ
2
Our System:
Gs = (M, Ys, Ds), Gs are of two types: (a) continuous
vari-ables and (b) discrete or censored random varivari-ables, including binary strings.
Mc,Ysc,Md,Ysd :
Table 1: Components of Gs
Continuous Discrete Variables deÞned for all s Mc Md Variables deÞned for s Ysc Ysd
Indicator of state − Ds
Continuous variables:
Mc = µcm (X) + Umc Ysc = µcs (X) + Usc. Discrete variables:
Data used for factor analysis :
Mc, M∗d, Ysc, Ys∗d and V (s), s = 1, ..., S.
Y =
S
P
s=1
DsYs.
M, Y, and Ds, s = 1, .., S all contain information on θ.
(A-1) ¡Umc , Umd, Usc, Usd, εW¢ have distribution functions that are absolutely continuous with respect to Lebesgue measure with means zero with support Uc
m × Udm × Ucs × Uds × EW
with upper and lower limits being U¯mc ,U¯md, U¯sc,U¯sd, εW
and Ucm, Udm, Ucs, Uds, εW, respectively, which may be bounded or inÞnite. Thus the joint system is measurably separable (variation free). We assume Þnite variances. The cumulative distribution function of εW is assumed to be
strictly increasing over its full support (εW, εW).
(A-2) (X, Z, Q) ⊥⊥ (U, εW) where U = (Umc , Umd, Usc, Usd), where
Theorem 1 From data on F (M | X), one can identify the joint distribution of ¡Umc , Umd¢ (the latter component only up to scale), the function µdm(X) is identiÞed and µcm(X) is iden-tiÞed over the support of X (up to scale) provided that the fol-lowing assumptions, in addition to the relevant components of (A-1) and (A-2), are invoked.
(A-3) Order the discrete measurement components to be
Þrst. Suppose that there are Nm,d discrete
compo-nents, followed by Nm,c continuous components.
As-sume Support
³
µd1,m (X), . . . , µdNm,d,m (X) ´
⊇ Support
¡
U1d,m, ..., UNdm,m ¢
.
(A-4) For each l = 1, ..., Nm,d µdl,m (X) = Xβdl,m.
(A-5) The X lives in a subset of RNX. There exists no linear
proper subspace of RNX having probability 1 under F
X,
Proof of Theorem 1: The case where M consists of purely continuous components is trivial. We observe Mc for each X and can recover the marginal distribution for each component. Recall that M is not state dependent.
For the purely discrete case, we encounter the usual problem that there is no direct observable counterpart for µdm (X). Under (A-1)-(A-5), we can use the analysis of Manski (1988) to identify the slope coefÞcients βdl,m up to scale, and the marginal distribution of Ul,md . From the assumption that the mean (or median) of Ul,md is zero, we can identify the intercept in βdl,m. We can repeat this for all discrete components. Thus coordinate by coordinate we can identify the marginal distributions of Umc , Uemd, µcm (X) and eµdm(X), the latter up to scale (“~” means identiÞed up to scale).
To recover the joint distribution write:
Pr (Mc ≤ mc, Md = (0, ...,0) | X)
= FUc m,Uemd
³
mc − µcm(X), −eµdm(X)´
by assumption (A-2). To identify FUc
m,Uemd (t
1, t2) for any given evaluation points in the support of (Umc ,Uemd), we know the function µedm (X) and using (A-3) we can Þnd an X where e
proof, t1, t2 may be vectors. Thus
Pr (Mc ≤ mc, Md = (0, ...,0) | X = xb) = FUc
m,Uemd (mc
− µcm(xb) , t2)
Let mbc = t1 − µcm(xb) to obtain
Pr (Mc ≤ mbc, Md = (0, ..., 0) | X = xb) = FUc
m,Uemd (t1, t2)
We know the left hand side and thus identify FUc
m,Uemd at the
Theorem 2 For the relevant subsets of the conditions (A-1), and (A-2) (speciÞcally, assuming absolute continuity of the dis-tribution of εW with respect to Lebesgue measure and εW ⊥⊥
(Z, Q)), and the additional assumptions:
(A-6) cs(Qs) = Qsηs, s = 1, ..., S, ϕ(Z) = Z0β
(A-7) (Q1, Z) is full rank (there is no proper subspace of the
support (Q1, Z) with probability 1). The Z contains no intercept.
(A-8) Qs for s = 2, . . . , S is full rank (there is no proper subspace
of ¡RQs¢ with probability 1).
(A-9) Support (c(Q1) − ϕ(Z)) ⊇ Support (εW)
Then the distribution function FεW is known up to a scale normalization on εW and cs(Qs), s = 1, ...s,¯ and ϕ(Z) are
Proof of Theorem 2:
Pr (D1 = 1 | Z, Q1) = Pr
µ
c1 (Q1) − ϕ(Z)
σW
> εW σW
¶
Under (A-1), (A-2), (A-6), (A-7) and (A-9), it follows that
c1(Q1)−ϕ(Z)
σW and F˜εW (where eεW = εW
σW) are identiÞed (see
Manski, 1988 or Matzkin 1992, 1993). Under rank condition (A-7), identiÞcation of c1(Q1)−ϕ(Z)
σW implies identiÞcation of c1(Q1)
σW and
ϕ(Z)
σW separately. Write
Pr (D2 = 1 | Z, Q1,Q2) = FeεW µ
c2(Q2) − ϕ(Z)
σW
¶
−Feε
W
µ
c1 (Q1) − ϕ(Z)
σW
¶ .
From the absolute continuity of eεW and the assumption that
the distribution function of eεW is strictly increasing, we can
write c2 (Q2)
σW = F
−1
eεW
·
Pr (D2 = 1 | Z, Q1, Q2) + FeεW µ
c1(Q1) − ϕ(Z)
σW
¶¸
+ϕ(Z)
σW . Thus we can identify c2(Q2)
Pr ¡Mc ≤ mc, M∗d ≤ 0, Ysc ≤ ysc, Ys∗d ≤ 0|Ds = 1, X, Z, Qs, Qs−1¢
×Pr(Ds = 1|Z, Qs, Qs−1) (22)
=
Z mc−µcm(X)
Uc
Z −eµdm(X)
e
U d
m
Z ysc−µcs(X)
Ucs
Z −eµd(X)
e
U d
Z cs(Qs)−ϕ(Z)
σW
cs−1(Qs−1)−ϕ(Z)
σW
f ³Um,c Uem,d Us,cUes,deεW
´
dUmc dUemddUscdUesddeεW.
(A-10)
Support µ
−eµdm(X),−µeds(X),
µ
cs (Qs) − ϕ(Z)
σW
− cs−1(Qs−1) − ϕ(Z)
σW
¶¶
⊇ Support(Umd, Usd,eεW) = (Ud
m × U d
s × EeW).
(A-11) There is no proper linear subspace of (X, Z, Qs, Qs−1) with probability one so the model is full rank.
As a consequence of (A-6) and (A-10) we can Þnd values of Qs, Qs−1, Q¯s, Qs−1 respectively so that
lim
Qs→Q¯s Qs−1→Qs−1
Theorem 3 Under assumptions 1), 2), 4), 6), (A-7),(A-8),(A-9),(A-10) and (A-11), µcm(X), µcs(X), eµdm(X),
e
µds(X),ϕ(Ze ), cs(Qs) s = 1, ..., S − 1 are identiÞed as is the
Proof of Theorem 3: From (A-2), the unobservables are jointly independent of (X, Z, Q). For Þxed values of
(Z, Qs, Qs−1), we may vary the points of evaluation for
the continuous coordinates (ysc) and pick alternative val-ues of X = xb to trace out the vector µc(X) up to intercept terms. Thus we can identify µcs,l(X) up to a constant for all
l = 1, ..., Nc,s.(Heckman and Honoré, 1990). Under (A-2), we recover the same functions for whatever values of Z, Qs, Qs−1
are prespeciÞed as long as cs (Qs) > cs−1(Qs−1), so that there
is interval of εW bounded above and below with positive prob-ability. This identiÞcation result does not require any passage to a limit argument.
For values of (Z, Qs, Qs−1) such that lim
Qs→Q¯s(Z)
Qs−1→Qs
−1(Z)
Pr (Ds= 1|Z, Qs, Qs−1) = 1.
where Q¯s (Z) is an upper limit and Qs−1(Z) is a lower
limit, and we allow the limits to depend on Z, we essentially integrate out eεW and obtain
Pr(Mc ≤ mc,µemd ≤ −Umd, Usc ≤ ysc −µc(X), Uesd ≤ −µeds(X))
Then proceeding as in the proof of Theorem 1, we can identify µeds(X) coordinate by coordinate and we obtain the constants in µcs,l(X), l = 1, ..., Nc,s as well as the constants in
e
µd(X). From the assumption of mean or median zero of the unobservables. In this exercise, we use the full rank condition on X which is part of assumption (A-11).
With these functions in hand, under the full conditions of assumption (A-10) we can Þx ysc, ymc ,µeds, eµdm, cs(Qs)σ−ϕ(Z)
W ,
cs−1(Qs−1)−ϕ(Z)
σW at different values to trace out the joint
S distributions: we factor analyze them.
Us = α0sθ + εs s = 1, ..., S
Um = α0mθm + εm
m = 1, ..., Nm
Theorem 4 Under the normalizations on the factor loadings
of the type in (21) for one system s under the conditions of Theorems 1-3, given the normalizations for the
unobserv-ables for the discrete components and given at least 2K + 1
measurements (Y, M, V ), the unrestricted factor loadings
and the variances of the factors (σ2θ
i, i = 1, ..., K) are
iden-tiÞed over all systems.
Nonparametric IdentiÞcation of Distributions
({Um}Nm
m=1,{Us,a}aA=1,eεV ) = Ts
First B1 elements depend only on θ1; next B2 − B1 elements depend on (θ1, θ2) and so forth.
Theorem 5 If
T1s = θ1 + v1 and
T2s = θ1 + v2
and θ1 ⊥⊥v1 ⊥⊥v2, the means of all three generating random variables are Þnite, E(v1) = E(v2) = 0, and the conditions of Fubini’s theorem are satisÞed for each random variable, and the random variables possess nonvanishing (a.e.) characteris-tic functions, then the densities of(θ1, v1, v2), g(θ1), g1(v1), g2(v2), respectively, are identiÞed.
T1s = λs11θ1 + εs1 where λs11 = 1 T2s = λs21θ1 + εs2 where λs21 6= 0.
T1s = θ1 + εs1 T2s
λs21 = θ1 + ε ∗,s
2 .
Proceeding to equations B1 + 1 and B1 + 2
TBs1+1 = λsB1+1,1θ1 + θ2 + εsB1+1
TBs1+2 = λsB1+2,1θ1 + λsB1+2,2θ2 + εsB1+2.
TBs+1 − λsB1+1,1θ1 = θ2 + εsB1+1 TBs+2 − λsB1+2,1θ1
λsB1+2,2 = θ2 + ε ∗,s B1+2
ε∗B,s1+2 = ε s B1+2
Example:
Two potential outcomes at a : (Y0, Y1). Set Z big, so Ds = 1.
Y0a = α0aθ + ε0a, a = 1, ..., A Y1a = α1aθ + ε1a, a = 1, ..., A
(ε0a ⊥⊥ ε1a0) ∀a,a0
(ε0a, ε1a) ⊥⊥ θ ∀a
Suppose that Pr( Ds = 1|Z) = 1. for Z ∈ Z1 and Pr( Ds = 0|Z) = 0. , for Z ∈ Z0.
Y0a = µ0a + α0aθ + ε0a, a = 1, . . . , A Y1a = µ1a + α1aθ + ε1a, a = 1, . . . , A,
A = 3, so if we have three panel observations, we obtain
COV (Y01, Y02) = α01α02σ2θ,
COV (Y01, Y03) = α01α03σ2θ,for Z ∈ Z0,
COV (Y02, Y03) = α02α03σ2θ,
COV (Y11, Y12) = α11α12σ2θ,
COV (Y11, Y13) = α11α13σ2θ, for Z ∈ Z1
COV (Y12, Y13) = a12a13σ2θ.
Assuming α01 = 1 or σ2θ = 1, α03 6= 0
COV (Y01, Y02)
COV (Y01, Y03) =
α02 α03
Given σ2θ = 1 and COV (Y02, Y03) = α02α03, we obtain
(α03)2 = COV (Y02, Y03)COV (Y01, Y03)
The variances of the uniquenesses:
V ar(ε0a) = V ar(Y0a) − α20aσ2θ a = 1, . . . , A
V ar(ε1a) = V ar(Y1a) − α2
1aσ2θ a = 1, . . . , A.
COV (T, Y0a) = βα0aσ2θ a = 1, ..., A COV (T, Y1a0) = βα1a0σ2
θ a
0 = 1, ..., A
Assume β 6= 0 and α0a, α1a0 6= 0 :
COV(T,Y1a0)
COV(T,Y0a) =
α1a0
α0a
a = 1, ..., A, a0 = 1, ..., A and a 6= a0
Factor Representation of Stationary AR(1) Processes
Theorem Let yt
t1
T be an stationary AR(1) process with T odd integer
observations where|| 1andt iid0,2. Then we can always findM T1 2
orthogonal factors, corresponding loadings (which may be normalized to zero or one) and uniquenessesut iid0,st2such thatyt yt1 t can be represented by yt 1t1. . .Mt M ut.
Proof By induction. The theorem obviously holds for T 1 observations of
the AR(1) process. Assume it holds for any odd integer T number of observations. We have to show that it also holds for T2. Indeed, let
yt 1t1 . . .Mt M ut,t 1, . . . ,T
be the factor representation of an stationary AR(1) processyttT1. If we add two more observations, one cannot representyttT12 with the originalM factors, because the extra number of parameters in the correlation matrix to be reproduced far exceeds the number of loadings that we get from the two extra observations. Thus, we need more factors. We show that only one more factor is necessary to represent the stationary AR(1) processyttT12 after the addition of the two extra observations. Indeed, let
yt 1t1 . . .Mt Mut,t 1, . . . ,T1
yt 1t1 . . .Mt M tM1M1 ut,t T,T1,T2
be its factor representation. Notice that
Corryt,yT1
i1
M it
i
T1i2, t 1, . . . ,T1
Corryt,yT2
i1
M it
i
T2i2, t 1, . . . ,T1
But this is a system of 2T1 with2M T1 unknowns. Thus, we can solve the system above to recoveriT1,iT2
i1
M
. Now,
CorryT,yT1
i1
M
iTiT1i2
M1
T M
1
T1M
1 2
CorryT,yT2
i1
M
iTiT2i2
M1
T M
1
T2M
1 2 2
CorryT1,yT2
i1
M
iT1iT2i2
M1
T1M
1
T2M
and defineA1 iM1iTiT1i2 ,A2 iM1TiiT2i2 andA3 Mi1iT1iT2i2 .
Then, the following system must be solved forMT11,M
1
T2, andM
1 2 :
MT11M21 A1
MT21M21 A2
MT11TM21M21 A3
and the solution is:
MT11
A3
A2
MT21
A3
A1
M21
A2A1
A3