Asymptotic efficiency in an instrumental variable model

(1)

FUNDAC

¸ ˜

AO GET ´

ULIO VARGAS

ESCOLA de P ´

OS-GRADUAC

¸ ˜

AO em ECONOMIA

Leonardo Salim Saker Chaves

Asymptotic Efficiency in an

Instrumental Variable Model

(2)

Leonardo Salim Saker Chaves

Asymptotic Efficiency in an

Instrumental Variable Model

Disserta¸cão para obten¸cão do grau de mestre apresentada à Escola de Pós-Gradua¸cão em Economia

´

Area de concentra¸c˜ao: Econometria

Orientador: Marcelo J. Moreira

(3)

Ficha catalográfica elaborada pela Biblioteca Mario Henrique Simonsen/FGV

Chaves, Leonardo Salim Saker

Asymptotic efficiency in an instrumental variable model / Leonardo Salim Saker Chaves. – 2015.

22 f.

Dissertação (mestrado) - Fundação Getulio Vargas, Escola de Pós-Graduação em Economia.

Orientador: Marcelo J. Moreira. Inclui bibliografia.

1. Método dos momentos (Estatística). 2. Econometria – Teoria assintótica. 3. Variáveis instrumentais (Estatística). I. Moreira, Marcelo J. II. Fundação Getulio Vargas. Escola de Pós- Graduação em Economia. III. Título.

(4)

(5)

Acknowledgements

I would like to express my gratitude to my advisor Marcelo Moreira for his guidance and support in my dissertation. I am also very thankfull to Lucas Villela and Gustavo Rabello de

Castro for their suggestions to improve this work. And last but not least important, I would

(6)

Resumo

Esta disserta¸cão se propõe ao estudo de inferência usando estima¸cão por método generalizado dos momentos (GMM) baseado no uso de instrumentos. A motiva¸cão para o estudo está no fato de que sob identifica¸cão fraca dos parâmetros, a inferência tradicional pode levar a resultados enganosos. Dessa forma, é feita uma revisão dos mais usuais testes para superar tal problema e uma apresenta¸cão dos arcabou¸cos propostos porMoreira (2002) eMoreira and Moreira (2013), eKleibergen(2005). Com isso, o trabalho concilia as estat´ısticas utilizadas por eles para realizar inferência e reescreve o teste score proposto em Kleibergen(2005) utilizando as estat´ısticas de Moreira and Moreira (2013), e é obtido usando a teoria assintótica em Newey and McFadden (1984) a estat´ıstica do teste score ótimo. Além disso, mostra-se a equivalência entre a abordagem por GMM e a que usa sistema de equa¸cões e verossimilhan¸ca para abordar o problema de identifica¸cão fraca.

(7)

Abstract

This work studies the hypothesis testing based on generalized method of moments (GMM) estimation given by instruments condition. The importance for the development of Economics lies on the fact that when identification is weak, the standard test can be misleading. Therefore, it is made a review of proposed tests to overcome this problem and also present two useful frameworks of study; fromMoreira (2002),Moreira and Moreira (2013) and Kleibergen(2005). So, this work conciliate the previous frameworks a way to write the score proposed initially in Kleibergen (2005) using Moreira and Moreira (2013) statistics and presents the optimal score test based on asymptotic theory fromNewey and McFadden(1984). Moreover, the study shows the equivalence between the GMM and maximum likelihood estimation to deal with the weak instruments problem.

(8)

Introduction

This study focuses on hypothesis testing over a parameter. To obtain moment conditions and identify correctly the parameter, it is used instrumental variables, i.e., variables correlated with endogenous variables, but not errors. As evidenced inBound et al. (1995),Dufour (1997), and Staiger and Stock (1997), results obtained from the traditional t-test can be misleading. It occurs because the the normal approximation is incorrect when instruments are not highly correlated with the desired variable.

To improve this inference for empirical work, some studies assume homoskedastic errors, and propose tests with an asymptotic distribution that is robust to the strength of identification. The first such test appeared inAnderson and Rubin(1949) and was based on a pivotal statistic, for which the asymptotic distribution was a chi-square with degrees of freedom equal to the number of instruments. Despite being optimal in the just identified model, it experience loss of power as the number of instrumental variables increases.

The literature has evolved to overcome this loss of power. First, Kleibergen (2002) and Moreira (2002) showed that a score test statistic has an asymptotic distribution independent not only from the strength of identification but also the number of instruments. Subsequently, Moreira (2003) proposed to use critical values constructed from conditional quantiles of test statistics. As a result, he developed the conditional likelihood ratio (CLR), a test which has correct size as it is similar by construction. Moreover,Andrews et al.(2006) show that the CLR test satisfies orthogonal invariance conditions and is nearly optimal, having a better performance than the score test.

Motivated by empirical research, the previous tests were developed for the generalized method of moments (GMM) estimation that allow heteroskedastic auto-correlated (HAC) er-rors. Such developments produced the generalization for Anderson-Rubin (AR) test in Stock and Wright (2000), Kleibergen (2005) for the score test and for the CLR under the author’s framework. Furthermore, Moreira and Moreira (2013) presented equivalent statistics for Mor-eira(2003) without assuming homoskedastic errors. Given the importance ofKleibergen(2005) andMoreira and Moreira(2013) robust frameworks in the literature, a method was recently de-veloped that contains these frameworks, called the conditional linear combination (CLC) tests inAndrews (2014).

The goal of this study is to explain a method for conciliating the statistics proposed inMoreira and Moreira (2013) with those of Kleibergen (2005). Such a step would enable rewriting the score test fromKleibergen(2005) as a combination between the two statistics shown inMoreira and Moreira(2013). Subsequently, an optimality result is obtained under standard asymptotic conditions from Newey and McFadden(1984).

Therefore, the thesis is structured as follows. To provide a better perspective of the subject, Section 1 presents the theoretical framework for estimation and inference based on instrumental variables. Section 2 presents and conciliates the frameworks ofMoreira and Moreira(2013) and Kleibergen(2005). Section 3 rewrites the score statistic under the framework fromMoreira and Moreira(2013) and derives the desirable properties of the score test based on strong instruments.

(10)

10

1 Standard Inference with Instrumental Variables on GMM

There is a sample of n observations from variables y1 and y2. Moreover, their relation is

assumed to be linear:

y1 =y2β+u (1)

where u is a n_×1 unobserved error.

The case of inference examined herein comprises the following complementary hypotheses:

H0:g(β) =β−β0= 0 H1 :g(β) =β−β0 6= 0

There is no distributional assumptions over u or restrictions over the nuisance parameters regulating the relationship between the error andy2.

An optimal test is considered to have the correct size and provide the highest possible power over the alternatives. Under the standard asymptotic theory inNewey and McFadden (1984), optimality results can be provided regarding the test to be developed. As the test is based on GMM estimation, it is essential to introduce and establish important properties.

1.1 Generalized Method of Moments and Instrumental Variables

For the scalar parameter β to be identified, y2 must be uncorrelated with the error. If this

is not the case, then the appropriate way is to use exogenous variables that are uncorrelated with the error and correlated withy2, i.e., instrumental variables. Therefore, it is assumed the

existence of k instruments forming Z, an×kmatrix of variables having full column rank. Then, kmoment conditions can be constructed based on the orthogonality between Z andu, which are defined by a vector-valued function of the datah: (R_×R2n₎_7→_Rk _{for expositional purposes:}

E_[_h₍_{β, Y}_{)] = 0} _⇐⇒ _β ₌_β₀ ₍₂₎

whereY = [y1, y2] is an×2 matrix of observations.

The estimation is built on sample averages of h(β, Y). The GMM estimator, ˆβGM M, is the

value that maximizes the scalar function:

Qn(β) =−

1 2n2

n

X

i=1

h(β, Yi)

!′

Wn n

X

i=1

h(β, Yi)

!

where{Wn}n∈N is a sequence of positive semi-definite weighting matrices.

The next proposition fromNewey and McFadden (1984) characterizes the weight matrix to obtain a GMM efficient estimator.

Proposition 1.1 (Newey and McFadden (1984), Theorem 5.2): Suppose that (i)βˆGM M p

→β0,

(ii)β0 belongs to the interior of the parametric spaceΘ, (iii) the functionQn(β) is twice

(11)

1.2 Score or LM Test 11

uously differentiable in a neighborhood N of β0, (iv) √n∇0Qn(β0)→d N(0,Υ) and (v) ∃ H(β)

a nonsingular matrix that is continuous at β0 and sup β∈N||∇ββ

Qn(β)−H(β)|| →p 0. Then, the

optimal weighting matrix is given byW∗=E_[_h₍_{β, Y}_i₎_h₍_{β, Y}_i₎′_]−1 ₌_V−1 0 .

In addition, it is possible to use a consistent robust estimator forW∗ as presented inNewey and West(1987) and still achieve an efficient estimator.

1.2 Score or LM Test

To construct the test statistic, let ˜β=arg min

s.t.H0

Qn(β),G= ∂g(β)_∂β

′

andScn(β∗) = ∂Q_∂βn(β)|β=β∗.

In addition, consider Υ(W) =Avar(√nScn(β0)), where Avar means asymptotic variance, and

B(W) = ∂2Qn(β)

∂β∂β′ +op(1), which are both positive semi-definite matrices. Newey and McFadden

(1984) show the following score statistic:

Score=nScn( ˜β)′B(W)−1G′(GB(W)−1Υ(W)B(W)−1G′)−1GB(W)−1Scn( ˜β) (3)

If ˜β →p β0, then Score →d χ2₍₁₎, under the null hypothesis. Consequently, the test with

asymptotic significance levelα rejects the null hypothesis if:

Score > cα

wherecα is the (1−α) quantile of a central χ2₍₁₎ distribution.

Concerning the power of this test, it follows a result of asymptotic behavior in alternatives close to the null, withH1:g(β0) =c/√nfor c ∈R.

Proposition 1.2 (Newey and McFadden (1984), Theorem 9.2): If hypotheses (i) through (v) from proposition 1.1are valid and β˜→p β0, then under local alternatives,

Score_→d χ2_(c′_(GB(W₎−1_Υ(W)B(W₎₋1_G_′₎₋1_c;d)

Moreover, the use of W =V₀−1 maximizes the non-centrality parameter of the distribution that will be (c′(GB(W∗)−1G′)−1c).

Therefore, to achieve higher probability of rejection from a test given the null is false and the true is a point in the alternative hypothesis, it is necessary to use the optimal weighting matrix or plug-in consistent estimator on the GMM objective function.

(12)

12

2 Framework and Equivalence

The extant literature revealed how misleading the standard t-test could be when instruments are weak. Hence, the need to develop new tools to overcome this innacuracy became apparent. The first alternative was to reviewAnderson and Rubin(1949), which produced a pivotal statistic whose asymptotic distribution was a function of the number of instruments. Therefore, a test based on this statistic could control the size; however, as the number of instruments increases, the power curve becomes flatter. Therefore, researchers have had to identify a test that could not only maintain the size of a test but also improve its power.

Therefore, Kleibergen (2002) and Moreira (2003) have had an immediate impact as they represented an improvement over the AR test and provided a new perspective. The former proposed the projection in a smaller space and keeping pivotal asAnderson and Rubin (1949), whereas the latter presented, through arguments of invariance and sufficiency, a test which conditions on a sufficient statistic for the nuisance parameters.

Recent research has led to the development of more robust versions of the previous frame-work, requiring fewer assumptions from the errors. Kleibergen (2005) used his framework to generalize the score test from Kleibergen (2002) and the CLR test from Moreira (2003) for the GMM approach. Subsequently, Moreira and Moreira (2013) presented statistics to work with HAC errors. Nevertheless, there is a need to conciliate the framework from Kleibergen (2005) withMoreira and Moreira (2013); before presenting the results, an introduction to each framework is required.

2.1 GMM framework

To obtain an estimator for β using GMM, Kleibergen(2005) used the following condition:

E_[_Z′₍_y₁₋_y₂_β_{)] =}E_[_Z′_{Y b}_{] = 0}

whereb= (1,₋β)′. This means that Z is exogenous to the system of equations and, as to develop finite sample properties, the matrix Z will be considered as fixed. Therefore, the proposed objective function is the following:

Q(β) = (Y b)′Z(W(β))−1Z′(Y b) (4) where the weight function depends onβ:

W(β) = (Z′E_[_uu′_]_Z_{) = (}_Z′_{V ar}_[_{Y b}_]_Z₎ = ((b′⊗Z′)Λ(b⊗Z))

In addition, Kleibergen (2005) proposed utilizing the statistics below to make inference. Under normality of errors and the null hypothesis, these statistics are independent and normally

(13)

2.2 Linear Instrumental Variables framework 13

distributed as follows:

Z′Y b0 ∼N(0, Z′Λǫǫ(β0)Z)

D(β) =Z′(y2−Λvǫ(β)Z(Z′Λǫǫ(β)Z)−1Z′Y b)∼N(vec(Z′Zπ),ΛD(β0))

whereb0 = (1,−β0)′, ΛD(β) =Z′[Λvv(β)−Λvǫ(β)Z(Z′Λǫǫ(β)Z)−1Z′Λǫv(β)]Z,e2= (0,1)′, and

(b′₀_⊗In)Λ(b0⊗In) (b′0⊗In)Λ(e2⊗In)

(e′₂⊗In)Λ(b0⊗In) (e′2⊗In)Λ(e2⊗In)

!

= Λǫǫ(β) Λǫv(β) Λvǫ(β) Λvv(β)

!

Following this, the score test from Kleibergen(2005) can be written as

LM = (Z′Y b0)′(V ar(Z′Y b0))−1/2N(V ar(Z′_{Y b}0))−1/2D(β0)(V ar(Z

′_{Y b}₀₎₎−1/2₍_Z′_{Y b}₀₎

where N_{(V ar(Z}′_{Y b}0))−1/2D(β0) represents the projection matrix over the space generated by the vectorV ar(Z′Y b0)−1/2D(β0).

Moreover, the test based on this statistic controls size and thus represented an improvement over the AR test. This characteristic comes from its chi-square distribution with one degree of freedom, which means that the power does not decrease with the number of instruments as with the AR.

Nevertheless, as Moreira (2003) and Andrews et al. (2006) have shown for the case of ho-moskedastic errors, the score test does not perform well in some parts of the parametric space and shows spurious loss of power compared to the CLR. More recently,Andrews (2014) reports a bad behavior for some cases of heteroskedastic errors.

2.2 Linear Instrumental Variables framework

Here,y2 is assumed to have a linear relation with instruments Z. This facilitates the use of

maximum likelihood estimator (MLE) to obtain an estimation for β, as developed in Moreira and Moreira (2013). Using an invariance argument, Moreira and Moreira (2013) transformed the data by pre-multiplying by (Z′Z)−1/2_Z′_:

(Z′Z)−1/2Z′Y = (Z′Z)1/2πa′+ (Z′Z)−1/2Z′V =µa′+W1

wherea= (β,1)′.

In addition, assuming vec(W1) ∼ N(0,Σ), the log likelihood for ψ = vec[(Z′Z)−1/2Z′Y],

concentrated with respect toπ is:

L(ψ,π˜(β);β) =−1

2ln(|Σ|)− 1 2(Σ

−1_ψψ′₎₋_ψ′_Σ−1₍_a_⊗_Z′_Z_)˜_π₍_β₎ ₍₅₎

Allowing the covariance matrix Σ to not be decomposed into a Kronecker product of a 2x2 matrix and an identity of order nmeans that more diverse variance designs can be taken into account. Thus, this development includes cases such as heteroskedasticity and/or autocorrelation between errors. To proceed with estimation when errors have a normal distribution with a known non-Kronecker variance matrix, the following sufficient statistics were presented:

(14)

2.3 Equivalence 14

S = [(b′₀⊗Ik)Σ(b0⊗Ik)]−1/2(Z′Z)−1/2Z′Y b0

=Cβ0(Z′Z)−

1/2_Z′_{Y b}

0

T = [(a′₀_⊗Ik)Σ−1(a0⊗Ik)]−1/2(a′0⊗Ik)Σ−1vec((Z′Z)−1/2Z′Y)

=D_β−₀1(a′₀_⊗Ik)Σ−1vec((Z′Z)−1/2Z′Y)

In addition to sufficiency, Moreira and Moreira (2013) presents some other properties. For instance, the statisticS is pivotal, and T is complete and sufficient for the parameterµ under the null hypothesis. Therefore, by Basu’s lemma, these two are independent.

With these statistics, the version of the CLR test from Moreira (2003) for heteroskedastic errors can be written as:

CLR= 1₂hS′S₋T′T +p[S′_S₊_T′_T_]2₋_4[_S′_S_×_T′_T ₋₍_S′_T_)]2i

To obtain its critical value, which is a function of T = t, the distribution is constructed through simulation methods considering the condition ’T =t’ and knowing thatS′S has a chi-square distribution withkdegrees of freedom. Moreover, the test controls size as it is conditioned on a sufficient statistic for the nuisance parameterπ. And it has a better performance in terms of power compared with the score test proposed byKleibergen (2005).

2.3 Equivalence

There is a connection between maximizing Equation (5) and minimizing Equation (4). This will guarantee the same estimator forβ regardless of the approach.

Proposition 2.1 For every type of variance-covariance matrixΛ, to maximize the equation

L(ψ,π˜(β);β) =−1

2ln(|Σ|)− 1 2(Σ

−1_ψψ′₎₋_ψ′_Σ−1₍_a_⊗_Z′_Z_)˜_π₍_β₎ ₍₆₎

with respect toβ is the same as minimize the equation

(Y b)′ZΛ−_S1Z′(Y b) (7)

with respect to β, where ΛS is the optimal weighting matrix, see appendix. The latter can be

seen as the result of a moment condition.

Proof See Appendix.

To unify the framework based on Moreira and Moreira (2013) and Kleibergen (2005), the following lemma generalizes the equivalence result presented inMoreira(2009).

Lemma 2.2 In the heteroskedastic autocorrelated case,Σ = (I2⊗(Z′Z)−1/2Z′)Λ(I2⊗Z(Z′Z)−1/2),

the following holds:

1. S =V ar((Z′Z)−1/2_Z′_{Y b}

0)−1/2(Z′Z)−1/2(Z′Y b0)

(15)

15

2. T =V ar((Z′Z)−1/2vec(D(β)))−1/2(Z′Z)−1/2vec(D(β))

Proof See Appendix.

3 Score Test under Strong Instruments and Local Alternatives

The model, until now, has been developed under the assumption that errors are normally dis-tributed. However, as the goal is to derive asymptotic properties of the score test, the finite sample assumptions are replaced by:

Assumption 3.1 Z_n′Z _→p DZ for some positive definite k×k matrixDZ.

Assumption 3.2 (Z′Z)√−1/2Z′V

n d

→N(0,Σ), where Σ =Avar((Z′Z)−1/2Z′V).

Assumption3.1holds under Birkhoff’s ergodic theorem, whereas Assumption3.2holds under certain conditions by a central limit theorem.

An equivalent for Equation (4) that can account for the asymptotic normality is:

Qn(β) =−

1

2n2 Z′Y b

′₍_W₍_β₎₎−1 _Z′_{Y b} ₍₈₎

Further, as only Λ is known, a consistent estimator for Σ is:

b

Σ =nI2⊗ Z′Z−1/2

(I2⊗Z′)Λ(I2⊗Z)

I2⊗ Z′Z−1/2

Thus, the consistent finite sample equivalent for Moreira and Moreira (2013) is labeled by Sn,Tn,Cβ0,n, andDβ0,n in whichΣ replaced Σ. Moreover, the next assumption is to work withb strong instruments and local alternatives.

Assumption 3.3 ❼ (a) β=β0+c/√n for some c ∈R.

❼ (b) π is a fixed non-zero vector _∀ n_≥1.

❼ (c) k is a fixed positive integer that does not depend on n.

Considering the equivalence between the statistics fromKleibergen (2005) andMoreira and Moreira (2013), it is possible to construct the score test statistic as a function of the desired statistics.

Proposition 3.4 Based on objective function8, if the Assumptions3.1,3.2, and3.3hold, then the score statistic is

Score=S_n′N_C β0,nD

−1 β0,nTn

Sn

Proof See Appendix.

Analyzing the asymptotic distribution of the score test, it is possible to confirm the optimality by its behavior at local alternatives.

(16)

16

Proposition 3.5 If Assumptions3.1,3.2, and 3.3hold, then the score test maximizes the non-central parameter from the chi-square and its value is

c2π′D1/2_Z ′C_β2₀D_Z1/2π

Proof See Appendix.

Another way to approach the problem of determining the optimal score test is to consider the score as a projection of the statisticSn on a space generated byATTn, whereAT is ak×k

matrix, and compare it with the test in Proposition 3.4. This allows us to obtain a class of weighting matricesAT whose score behavior is optimal besidesCβ0,nD

−1

β0,n. The following result proves this point:

Lemma 3.6 Under Assumptions 3.1, 3.2, and 3.3, the optimal score test is produced by not only the projection of Sn over the space generated byCβ0,nD−

1

β0,nTn but also any space generated by ATTn, where AT takes the form:

AT =d Cβ0D

−1

β0 +op(1), for d6= 0

Proof See Appendix.

(17)

REFERENCES 17

References

Anderson, T. W. and Rubin, H. (1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. The Annals of Mathematical Statistics, 20(1):46–63. Andrews, D. W., Moreira, M., and Stock, J. H. (2006). Optimal Two-Sided Invariant Similar

Tests for Instrumental Variables Regression. Econometrica, 74(3):715–752.

Andrews, I. (2014). Conditional Linear Combination Tests for Weakly Identified Models. Work-ing paper.

Bound, J., Jaeger, D., and Baker, R. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association, 90:443–450.

Dufour, J.-M. (1997). Some impossibility theorems in econometrics with applications to struc-tural and dynamic models. Econometrica, 65(6):1365–1388.

Kleibergen, F. (2002). Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression. Econometrica, 70(5):1781–1803.

Kleibergen, F. (2005). Testing Parameters in GMM Without Assuming that They Are Identified. Econometrica, 73(4):1103–1123.

Moreira, H. and Moreira, M. J. (2013). Contributions to the Theory of Optimal Tests. Eco-nomics Working Papers (Ensaios Economicos da EPGE) 747, FGV/EPGE Escola Brasileira de Economia e Finan¸cas, Getulio Vargas Foundation (Brazil).

Moreira, M. J. (2002).Tests With Correct Size in the Simultaneous Equation Model. PhD thesis, UC Berkeley.

Moreira, M. J. (2003). A Conditional Likelihood Ratio Test for Structural Models.Econometrica, 71(4):1027–1048.

Moreira, M. J. (2009). Tests with correct size when instruments can be arbitrarily weak.Journal of Econometrics, 152(2):131–140.

Newey, W. and West, K. (1987). A simple, positive semi-definite, heteroskedasticity and auto-correlation consistent covariance matrix. Econometrica, 55(3):703–08.

Newey, W. K. and McFadden, D. (1984). Large sample estimation and hypothesis testing. In Engle, R. F. and McFadden, D., editors,Handbook of Econometrics, volume 4 ofHandbook of Econometrics, chapter 36, pages 2111–2245. Elsevier.

Staiger, D. and Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments. Econometrica, 65(3):557–586.

Stock, J. H. and Wright, J. (2000). GMM with Weak Identification. Econometrica, 68(5):1055– 1096.

(18)

18

Part I

Appendix

Proof of Proposition 2.1: Notice the following,

ψ′Σ−1(a⊗Z′Z)˜π(β) =ψ′Σ−1(a⊗Z′Z)[(a⊗Z′Z)′Σ−1(a⊗Z′Z)]−1(a⊗Z′Z)′Σ−1ψ

=ψ′Σ−1

a √

a′_a⊗Ik

a √

a′_a⊗Ik ′

Σ−1

a √

a′_a⊗Ik

−1

a √

a′_a⊗Ik ′

Σ−1ψ (9)

Now looking at the matrix X= [X1, X2]:

[X1, X2] = [Σ1/2

a √

a′_a⊗Ik

,Σ1/2

b √

b′_b ⊗Ik

= Σ1/2[˜a_⊗Ik,˜b⊗Ik]

= Σ1/2[˜a,˜b]_⊗Ik

where ˜aand ˜bare normalized versions of the vectors aand brespectively.

Taking the inverse from X′X:

[X1, X2]′[X1, X2]−1 =

X₁′X1 X1′X2

X₂′X1 X2′X2

!−1

= X

11 _X12

X21 X22

!

=h([˜a,˜b]⊗Ik)′Σ([˜a,˜b]⊗Ik)

i−1

= ([˜a,˜b]−1⊗Ik)Σ−1([˜a,˜b]′−1⊗Ik)

= ([ã,˜b]⊗Ik)Σ−1([ã,˜b]⊗Ik) due to [ã,˜b] being an orthogonal matrix

Thus, there is the equality:

X11= (˜a′_⊗IN)Σ−1(˜a⊗Ik)

(X11)−1 = [(˜a′_⊗Ik)Σ−1(˜a⊗Ik)]−1

Alternatively

(X11)−1 = (ã′_⊗Ik)Σ(ã⊗Ik)−(ã′⊗Ik)Σ(˜b⊗Ik)

h

(˜b_⊗Ik)Σ(˜b⊗Ik)

i−1

(˜b′_⊗Ik)Σ(˜a⊗Ik)

= (˜a′_⊗Ik)Σ1/2M_Σ1_/2

(˜b⊗Ik)Σ

1/2_(˜_a_⊗_I

k) (10)

(19)

19

Plugging Expression10 into Equation 9:

ψ′Σ−1(˜a˜a′⊗Ik)Σ1/2M_Σ1/2

(˜b⊗Ik)Σ

1/2_(˜_a_˜_a′_⊗_I

k)Σ−1ψ

=ψ′Σ−1((I2−˜b˜b′)⊗Ik)Σ1/2M_Σ1/2_(˜_b

⊗Ik)Σ

1/2₍₍_I

2−˜b˜b′)⊗Ik)Σ−1ψ

=ψ′Σ−1(I2⊗Ik)Σ1/2M_Σ1/2_(˜_b

⊗Ik)Σ

1/2₍_I

2⊗Ik)Σ−1ψ

=ψ′Σ−1/2M_Σ1_/2

(˜b⊗Ik)Σ

−1/2_ψ

=ψ′Σ−1ψ−ψ′(˜b⊗Ik)[(˜b′⊗Ik)Σ(˜b⊗Ik)]−1(˜b′⊗Ik)ψ (11)

Therefore maximizing Equation 5 is the same as maximizing Equation 11. And maximizing Equation11 is equivalent to minimizing

ψ′(˜b_⊗Ik)[(˜b′⊗Ik)Σ(˜b⊗Ik)]−1(˜b′⊗Ik)ψ (12)

Further development can be done in Equation 12 to resemble the objective function from Kleibergen(2005):

ψ′(˜b_⊗Ik)[(˜b′⊗Ik)Σ(˜b⊗Ik)]−1(˜b′⊗Ik)ψ

=ψ′(˜b_⊗Ik)[(˜b′⊗(Z′Z)−1/2Z′)Λ(˜b⊗Z(Z′Z)−1/2)]−1(˜b′⊗Ik)ψ

= ((Z′Z)−1/2Z′Y˜b)′[(˜b′_⊗(Z′Z)−1/2Z′)Λ(˜b_⊗Z(Z′Z)−1/2)]−1((Z′Z)−1/2Z′Y˜b) = (Y˜b)′Z[(˜b′_⊗Z′)Λ(˜b_⊗Z)]−1Z′(Y˜b)

= (Y˜b)′Z[V ar(Z′Y˜b)]−1Z′(Yb˜) = (Y˜b)′ZΛ_S−1Z′(Y˜b)

Consequently,

min

β (Y b)

′_Z_Λ−1

S Z′(Y b) = min_β R˜(β) (13)

The matrix ΛS is the optimal weighting matrix for the moment condition based onE[Z′Y˜b] =

0. Thus, there is the equivalence of maximizing 5 with minimizing a GMM function given by 13.

Proof of Lemma 2.2:

1. This result is almost straightforward. Notice that:

V ar(Z′Y b0) = (b0′ ⊗Z′)Λ(b0⊗Z)

=Z′(b′₀⊗In)Λ(b0⊗In)Z

= (Z′ΛǫǫZ)

(20)

20

Now, rewriting the statistic S:

S= [(b′₀⊗(Z′Z)−1/2Z′)Λ(b0⊗Z(Z′Z)−1/2)]−1/2(Z′Z)−1/2Z′Y b0

= [(Z′Z)−1/2Z′(b′₀⊗In)Λ(b0⊗In)Z(Z′Z)−1/2]−1/2(Z′Z)−1/2Z′Y b0

= (Z′Z)1/4[Z′(b′₀_⊗In)Λ(b0⊗In)Z]−1/2(Z′Z)1/4(Z′Z)−1/2Z′Y b0

= (Z′Z)1/4V ar(Z′Y b0)−1/2(Z′Z)−1/4(Z′Y b0)

2. For this part, it is applied the pseudo-inverse procedure to rewrite [(a′₀⊗Ik)Σ−1(a0⊗Ik)]−1:

[(a′₀_⊗Ik)Σ−1(a0⊗Ik)]−1 = [(e′2⊗Ik)Σ1/2MΣ1/2_(b 0⊗Ik)Σ

1/2₍_e

2⊗Ik)]

= (Z′Z)−1/2Z′ΛvvZ(Z′Z)−1/2−(Z′Z)−1/2Z′ΛvǫZ(Z′ΛǫǫZ)−1Z′ΛǫvZ(Z′Z)−1/2

= (Z′Z)−1/2Z′(Λvv−ΛvǫZ(Z′ΛǫǫZ)−1Z′Λǫv)Z(Z′Z)−1/2

= (Z′Z)−1/2ΛD(Z′Z)−1/2

Using this to rewrite T:

T = (Z′Z)−1/4Λ_D1/2(Z′Z)−1/4(a′₀_⊗Ik)Σ−1vec((Z′Z)−1/2Z′Y)

= (Z′Z)−1/4Λ1/2_D (Z′Z)−1/4(a′₀_⊗Ik)Σ−1(I2⊗(Z′Z)−1/2Z′)vec(Y)

Developing (Z′Z)−1/2vec(D(β0)):

(Z′Z)−1/2D(β0) = ((e′2⊗Ik)Σ1/2MΣ1/2

(b0⊗I_k)Σ

−1/2₍_I

2⊗(Z′Z)−1/2Z′))vec(Y)

= ((e′₂_⊗Ik)Σ1/2MΣ1/2_(b 0⊗I_k)Σ

1/2₍_I

2⊗Ik)Σ−1(I2⊗(Z′Z)−1/2Z′))vec(Y)

= ((e′₂_⊗Ik)Σ1/2MΣ1/2_(b 0⊗I_k)Σ

1/2_([_b

0+βe2, e2]⊗Ik))Σ−1vec((Z′Z)−1/2Z′Y)

= ((a′₀_⊗Ik)Σ−1(a0⊗Ik))−1(a′0⊗Ik)Σ−1vec((Z′Z)−1/2Z′Y)

= ((a′₀_⊗Ik)Σ−1(a0⊗Ik))−1/2T

= (Z′Z)−1/4Λ_D1/2(Z′Z)−1/4T

Isolating the term T:

T = (Z′Z)1/4Λ−_D1/2(Z′Z)−1/4vec(D(β0))

Proof of Proposition 3.4: The first step is to build the Score based on Equation 3. In the desired setting, the parameter is θ = β because π is written as a function of β, and the proposed test is:

H0 :g(θ) =β−β0= 0

HI :g(θ) =β−β06= 0

Now it is possible to define the components from the Score:

G= ∂ ∂βg(β)

′ ₌ ∂

∂β(β−β0)

′ _{= 1}

(21)

21

To constructScn( ˜β) notice that the value which minimizes4subject to H0 is exactly β0. So,

Scn( ˜β) =

∂

∂β′Qn(β)|β=β0 = 1

n2(Z′Y b0)′(Z′Λǫǫ(β0)Z)− 1_D₍_β

0)

= 1 nS

′

nCβ−01,n(Z

′_Z₎1/2₍_Z′_Λ

ǫǫ(β0)Z)−1(Z′Z)1/2D_β−01,nTn = 1

nS

′

nCβ0,nD

−1 β0,nTn

= 1 nT

′

nD−β01,nCβ0,nSn

As the weighting matrix converges to the optimal, it is possible to useB(W) = Υ(W). Therefore, it just needs to compute one of them. To construct the asymptotic distribution of√nScn(β0), it

is assumed local alternatives with strong instruments and the statisticsSnandTnare considered

independent.

Therefore, it is established the following convergences assuming 3.1,3.2and 3.3:

Sn=Cβ0,n

_Z_′_Z

n

−1/2_Z_′_{Y b} 0

√ n

=Cβ0,n

Z′Z n

−1/2

(Z′Z)πa′b0

√

n +

Z′V b0

√ n

=Cβ0,n

Z′Z n

−1/2₍_Z_′_Z₎

n πC+ Z′V b0

√ n

d

→Cβ0D

1/2

Z πC+N(0, Ik)

≡N(Cβ0D

1/2

Z πC;Ik)

Tn

√ n =D

−1 β0,n(a

′

0⊗Ik)Σb−1vec

Z′Z n

−1/2

Z′Y n

!

p

→D_β−₀1(a′₀⊗Ik)Σ−1vec(D1/2Z πa′0)

≡Dβ0D

1/2 Z π

√

nSn(β0) =

1 √

nS

′

nCβ0,nD

−1 β0,nTn

d

→N(0, B(W)) =N(0, π′D_Z1/2C_β20D

1/2 Z π)

As the value ofπ is not known, an alternative is consider a consistent estimator for B(W).

\ B(W) = T

′

nD−β01,nC

2 β0,nD

−1 β0,nTn n

(22)

22

Now, what is left is assemble the parts:

Score=nScn( ˜β)′×B(W)−1×1×(1×B(W)−1×1)−1×1×B(W)−1×Scn( ˜β)

=nScn( ˜β)′B\(W)

−1

Scn( ˜β)

=n2S_n′Cβ0,nD

−1 β0,nTn(T

′

nD−β01,nC

2 β0,nD

−1 β0,nTn)

−1_T′

nDβ−01,nCβ0,nSn =S_n′N_C

β0,nD

−1 β0,nTn

Sn

Proof of Proposition 3.5: Looking at S_n′Cβ0,nD

−1 β0,nTn(T

′

nDβ−01,nC

2 β0,nD

−1 β0,nTn)

−1/2 _and

still assuming3.1,3.2 and 3.3:

S_n′Cβ0,nD

−1 β0,nTn(T

′

nD−β01,nC

2 β0,nD

−1 β0,nTn)

−1/2 _→d _N₍_cπ′_D1/2

Z Cβ0;Ik)Cβ0D

1/2 Z π(π′D

1/2 Z Cβ20D

1/2 Z π)−1/2

≡N(cπ′D_Z1/2C_β2₀D1/2_Z π; 1)

Therefore, it is concluded:

S_n′N_C β0,nD

−1 β0,nTn

Sn →d HI

χ2

{(cπ′_D1/2

Z Cβ0N Cβ0D

1_/2 Z π

Cβ0D 1_/2 Z πc);1} ≡χ2

{(cπ′_D1/2

Z C 2 β0D

1_/2 Z πc);1}

Moreover, as the GMM objective function is based on the optimal weighting matrix, it is possible to apply proposition1.2and the conclusion is thatcπ′D_Z1/2C_β2₀D1/2_Z πcmaximizes the parameter from the non-central qui-square.

Proof of Lemma 3.6 The difference between the non-centrality from the known optimal

and a Score based onAT is:

c2π′D_Z1/2C_β20D

1/2

Z π−c2π′D 1/2 Z Cβ0N_A

TDβ0D 1/2 Z π

D_Z1/2π=c2π′D_Z1/2Cβ0(Ik−N_A TDβ0D

1/2 Z π

)Cβ0D

1/2 Z π

=c2π′D_Z1/2Cβ0M_A TDβ0D

1/2 Z π

Cβ0D_Z1/2π

The class of optimal weight AT is given when the difference equals zero which means that

asymptoticallyATDβ0D1/2_Z π andCβ0D

1/2

Z π must be collinear vectors,i.e.:

ATDβ0D

1/2

Z π=dCβ0D

1/2

Z π, for d6= 0

Therefore,

AT =dCβ0D

−1

β0 +op(1), for d6= 0

Asymptotic efficiency in an instrumental variable model

FUNDAC

¸ ˜

AO GET ´

ULIO VARGAS

ESCOLA de P ´

OS-GRADUAC

¸ ˜

AO em ECONOMIA

Leonardo Salim Saker Chaves

Asymptotic Efficiency in an

Instrumental Variable Model

Leonardo Salim Saker Chaves

Asymptotic Efficiency in an

Instrumental Variable Model

Acknowledgements

Resumo

Abstract

Contents

Introduction

1

Standard Inference with Instrumental Variables on GMM

2

Framework and Equivalence

3

Score Test under Strong Instruments and Local Alternatives

References

Part I

Appendix