To cointegrate or not to cointegrate? That's a topological question

(1)

, .'

.

. "

... .,..FUNDAÇÃO ... GETULIO VARGAS

EPGE

Escola de Pós-Graduação em Economia

"TO COINTEGRATE OR

NOT TO COINTEGRATE

?

THAT'S

A

TOPOLOGICAI.

QUESTION ."

,...., A

RENATO GAL V AO FLORES JUNIOR

(EPGE-FGV / UFRJ)

LOCAL

Fundação Getulio Vargas

Praia de Botafogo, 190 - 10° andar - Auditório

DATA

16/1 0/97 (sa feira)

HORÁRIO

16:00h

33

0 . 0 (/2 F CJ4f 1

(2)

..

How often do you cointegrate ?

or

To cointegrate or not to cointegrate ? That's a topological question.·

Renato G. Flôres Jr.

EPGEIFGV, Rio de Janeiro and Ecole de Commerce SolvayfULB, Bruxelles

(Very) Preliminary version; please do not circulate.

September, 1997.

Abstract

We show that for any multivariate I( 1) process which does not

cointegrate, it is possible to find another process sufficient1y elose to it where

cointegration applies. Closeness is defined in terms of the spectral density matrices of the

respective processes in differences, i.e., a metric which takes into account only the

information in the (centred) second moments. The result may explain why in practice

cointegration is found a bit "too often". Examples developing this point and simulations

giving an insight on the metric used are also presented .

• I am indebted to Clive Granger for discussions and providing the model in Example 1.

(3)

.

..

.

..

1. Introduction.

This paper presents a topological result for the phenomenon of

cointegration. It is shown that for any multivariate I( 1) process which does not

cointegrate, it is possible to fmd another process sufficient1y c10se to it where

cointegration appIies. CIoseness is defmed in terms of the spectral density

matrices ofthe respective processes in differences, i.e., a metric which takes into

account only the information in the second moments. In spite of this, the result

may expIain why in practice cointegration is found a bit "too often".

The structure of the paper is as follows. In the next section we motivate

the resuIt, with a series of non-trivial examples in the time domain. The main

theorem and its interpretations are dealt with in section 3. Section 4 discusses

further consequences of the result and presents a few simuIations that shed some

light on the behaviour impIicit in the metric used. The final section conc1udes.

2. A cointegrating machine.

Flôres and Szafarz (1994) presented a general condition for two I( 1)

processes whose differences are moving averages to cointegrate. Considering the

MA's {Lllt} and {~Yt}, with representations:

if the original processes {Xt} and {Yt} cointegrate, with vector (l,c), it follows

(4)

(1)

where r*(L) v*t is a WoId representation of p(L) et + _{cq(L) Ut .}

From (I), one has

·

.' X_t+ cYt = [r*(L) / (1-L) ] v*t ₍₂₎

; so that the poIynomial r*(L) must have a unit root, i.e., the combined process

must be a non-invertibIe moving average.

Stated in this generality it is perhaps difficult to fulIy grasp the

impIications in (2). However, a few speciaI cases may raise interesting insights.

Example 1.' The random-walk case.

Suppose p(L)=q(L)=I ; then r*(L) = et + C Ut and it might seem that

cointegration is impossibIe. It is, indeed, if the innovation processes {e_t} and {ut}

are completely independent (including alI leads and lags), but not necessarily if

-in spite of hav-ing zero contemporaneous covariance - they are cross-correlated at

some lago It is not vety easy to produce examples of two random walks whose

innovations, though contemporaneously non correlated, have some cross-links.

The model below is one of them:

Let xt = _wt+ e!t

(3)

with L1Wt = Ut + a Ut-l , -1<a<I, and {Ut}, {elt}, {e2t} white noises, all

completely independent, with variances _{Vu , Ver, Ve2 , respectively, satisfying}

t

{

I

\

i

(5)

.

.'

..

.

-•

It follows that both original processes are random-walks. Taking {Xt} for

instance, it is obviously 1(1) and one has

so that ,

cov(Llxt, Llxt-l)

=

cOV(Ut-l' a Ut-l ) - cov (e1.t-l' el.t-l )

=

aVu - Ve

=

O

the covariances with a higher lag being obviously zero .

But, also, both processes are cointegrated as:

is a stationary processo

To acquire a deeper insight on the example, linking it to the result

mentioned in the beginning of this section., it is interesting to compute the lag

polynomial r*(L). As both differences are represented by combinations of

independent white noises up to one lag, the ~ representation of any linear

combination of them will be of order one. Given that c = -1 , putting

r*(L)

=

1

+

b L ,(1) becomes:

Ô. (xt - Yt)

=

elt - el,t-l - e2t

+

e2,t-l

=

(1

+

bL) v*t

Well known properties of a ~(1) give the system:

b var v*t = -2 Ve , (4)

so that, as b

=f

O, it implies that b = -1 and r*(L) is non-invertible as expected .

(6)

Example 2: The cointegrating machine.

The gist ofthe previous example lies in the special arrangement Ve = aVu ,

which makes for the two key properties of the processes in (3): they are random

walks but their innovations have a non-zero correlation at lag 1. This is not

• .

' evident at a frrst look at (3), and it may seem that the example is a bit toa

artificial. Indeed, processes {Xt} and {Yt} are very similar: they share the same

common part and their innovations bear the same characteristics. However,

based on this simple idea, a whole "machine" for generating cointegrated random

walks may be created.

The key point is to focus on the Wold representation of a bivariate

stationary stochastic process with no deterministic component

(5)

To simplify matters, we shall suppose that alllag polynomials have degree

one and that {elt} and {e2t} are (completely) independent random walks with

identical variances. A little algebra translates the conditions for each component

ofthe integrated bivariate processes { (xt,yt)'} (i.e., all those processes for which

(~t,~Yt)'=( x*t,Y*t)' ) to be a (weak) random walk into:

(6)

A condition for the two innovations be contemporaneously uncorrelated can also

(7)

As known from the previous discussion, cointegration needs the existence

of a non-invertible linear combination of the innovations. As such a combination,

.

' with general form

.

. '

will be a MA(1); applying to it the same properties that led to (4), with b = -1

already, one arrives at the restriction (8):

The system (6)+(7)+(8) provides a mechanism for generating several

representations that will satisfy the three requirements. It is easy to verify that

is a possible solution. This implies tha~ given the family ofbivariate processes

(9)

where {elt} and {e2t} are independent white noises with the same variance, any

integral of {( x* b y* t)'} wiIl be a pair of random walks whose innovations are

(8)

-sense, (9) is even more startling than (3), as in that case the innovations were

contemporaneously correlated.

It is not very difficult to accept that the number of possible families like

the one above is quite numerous. By allowing polynomials of higher degrees in

~

," (5), by working with different variances for the white noises, a variety of

examples can be constructed, satisfying even more stringent conditions, but

always ensuing cointegrated processes.

3. A general resulto

In the previous section we produced a class of processes - in a bivariate

setting - which allow for cointegration. This may suggest that processes of this

sort might be somehow dense, what seems to contradict the usual intuition that

cointegration is a "rare" or "zero measure" event, which only takes place under

strong behaviourallinkages, explained by economic theory. Of course, a density

property will depend on which set, and under which topology, it is being verified.

In dealing with multivariate stochastic processes, a natural space where to work

would be the space of trajectories. However, it suffices to look at the trajectories

of cointegrated and non-cointegrated processes to have the feeling that this is not

perhaps the adequate place where to search for such a result.

We shall instead work with the spectral density matrix associated with the

differenced processo Use of this object in the context of cointegration was

previously made by Phillips and Ouliaris (1988). A proximity result in this space

(9)

-First, as the variance of the spectral measure of the process,l there is not a

one-to-one correspondence between spectral densities and stationary processes.

Secondly, to recover the original process, this stationary process has to be

integrated again a not one-to-one operation. This means that two

"disaggregations" are performed to arrive at the space of ultimate interest. This

also explains why a (topological) density result in terms of cointegration can be

obtained in a certain space, without contradicting its rareness in the space of alI

I( 1) n-variate processes.

That some aggregation is needed is more evident if one thinks of the

previously mentioned space of trajectories. Indeed, there, the space is too "[me"

to give what is desired, unless a coarse and uninteresting topology is used. That

however the particular sequence of aggregations in our case is meaningful is what

we try to convince the reader in the next section. We pass now to the formal

setting.

Let S be the space of n+ 1- dimensional stationary processes {X*t} such

that each component admits an invertible MA representation. To each {X*t} one

can associate the set {ft; 't

=

0, 1, 2, ... } of autocovariance matrices.2 We shall

use as the norm of a matrix A, denoted II A 11, the highest absolute value of its cell

entries. With this, a distance is defined for each pair {X*t} ,{Y*t} in S as:

1 Or spectral representation ofthe process, see Anderson (1971), chapter 7.

(10)

With the aid of the spectral representation theorem one can prove that d is

really a distance. The metric space (S,d) is not a vector space however, as it is not

closed to addition. The following is true:

Proposition. Let S'cS be the (metric) subspace of (S,d) fonned by those

processes whose components admit at least one linear combination which is not

invertible, then S' is dense in S.

T o prove the above Proposition one needs the

Lemma. Let {X*t} be a process in S whose components admit k, l~16n, linear

combinations which are not invertible, then its spectral density matrix at

frequency w=O , L(O) , has k zero eigenvalues and its entries satisfy the

following system of k equations:

<l'i(L(O))

=

O , I~~ ,

where, for each i, <l'j is the sum of the principal minors of order n+ l-i of L(O),

multiplied by (-I )i-l .

(11)

.-Theorem: Let {X_t} be a n+l-dimensional 1(1) process, then either (exclusively)

ofboth cases holds:

i) the components of {LlX_{t }} admit k, l~k~n, linear combinations which are not

invertible;

or

ii) for evel)' E > O , and l~k~n, there is a process {X*\} in S' such that

d( {LlXt } , {X*\}) < E

Moreover, for a given k, this process can be chosen in such a way that ali its

marginallong-run variances, and all the covariances between its components (for

allleads lags) but k are identical to those of {Axt}.

Proof: See the Appendix.

The Theorem says then that glVen an I( 1) process whieh is not

eointegrated, i.e., sueh that {LlX_{t }} is not in S', it is possible to find another I( 1)

process, suffieient1y "close" to the first one, that is eointegrated. Closeness is

understood in the sense that both differeneed processes have the same marginal

speetra and, with one exeeption, the same speetra; for the unequal

eo-speetrum, the eorresponding eovarianees ean be made as close as desired. This

means that, for this eo-speetrum, one may have to "add" non-zero eovarianees

between the two eomponents at stake, at ever longer lags and leads for deereasing

E. This is indeed the rationale behind the examples in the previous seetion in

whieh the contemporaneously uncorrelated random walks turn out to be

(12)

cointegrated. By introducing these "infinitesimal" covariances, in longer time

spans for each smaIler E, one forces the appearance of the non-invertible

combined moving average and, consequent1y, the occurrence of cointegration.

Any integral of this sequence of "elose" stationary processes, will be an I( 1)

process "elose" - now with the due quaIifications already made - to {Xt}.

4. A deeper insight on the resulto

What are the practicaI consequences of the Theorem in section 3 ? We try

to put it into an applied perspective by discussing in Example 3 one of the most

popular tests for cointegration, the Johansen procedure. The other example tries

to develop the intuition on the "elose" processes in I( 1) space.

Example 3: Johansen's test.

Putting it rather briefly, Johansen (1988, 1991)'s test assumes that {Xt}

has a V AR(P) representation in leveIs, so that an error correction model (ECM)

of order p can be derived:

(10)

The basic idea of the procedure is to "clean" ~t and Xt-l of the p-l lags in

the difIerences and then try to determine the rank of

Ao

by testing the significance

of the canonicaI correlations (or rather, the multivariate regression) between the

residuais from the two regressions. As known, the absence of correlation, i.e.,

(13)

Suppose then that {Xt} is not cointegrateel, that the hypotheses of

Johansen's test are verified and that the researcher underfits mode1 (10), stopping

at a lag p'<p . This means that after regressing à.,Xt and Xt-l on the p' lags, a

"common part" due to the p-p' missing (last) lags will be in the residuals. The

Theorem points out that the introduction of cross correlations leads to

cointegration anel, in this case, the probability of falsely accepting cointegration

will be higher. Now suppose the same setting with the researcher overfitting

mode1 (10); according to the same reasoning this seems less serious in tenns of

distorting the testo

The more interesting situation is perhaps when mode1 (10) is correctly

fitted but, due either to measurement errors or to the use of proxies, the

cross-covariance between two components of {Xt} become artificially higher at longer

lags. They wiIl then remain in both residuals and false cointegrations wiIl be

more likely. The same applies if, due to some policy mechanism, a longer (cross)

memory is created between two series during a certain period. As the proof of the

Proposition shows, even if the memory remains short but the lagged covariances

increase sufticiently cointegration may take place. Of course in this case,

deciding whether the found cointegration is economically meaningful or not is a

matter to the researcher's personal judgement. Further developments of this last

point are weIl posed in Harvey (1997).

Example 4: How dose are the 1(1) processes?

Consider at frrst the pair of random walks

(14)

Yt = Yt-l + e2t Xo = Yo = O

var elt = var e2t = 1 , cov( elt, e2 t+k) = O , for all k integer

which is naturally not cointegrated. Now, for E = 1/2 ; 1/4 and 1/6 , we are

'! going to generate new pairs of random walks, cointegrated, whose differences are

not further than E - in the metric proposed in section 3 - to {(~t.ôYt)'} .

Following the reasoning in the previous section, the way to do this is to introduce

"extra"covariances of order 11m , between {elt } and {e2t} at a11 leads and lags

from 1 to m. The respective values of m are 1; 2 and 3. The integrals in each case

are generated as follows:

i) the new innovations (elt * ,e2t *)' , with the due cross-correlations, are generated

with the help of their W old representation. For the case E = 1/2 and m= 1 , the

representation is:

.fi

2

.fi

L 2

These representations are found by solving the system which links the

coefficients of a (bivariate) MA of order m to its m+ 1 covariance matrices (see,

for instance Hamilton (199), chapter 10). The matrices of coefficients, at any lag,

are symmetric as we impose that covariances are equal for leads and lags at the

same distance from the ori~ i.e.,

r't

=

r_'t .

Even so, the solution of this

(15)

ii) with the assumption that XlO*=X20*=O , the integrated process {(Xlt*, X2/)'} is

generated by:

;=1

Exhibits 1, 2 and 3 show, respectively, each integrated process and the

original random walks.

insert Exhibits by here

Appendix: Proofs

Lemma: Denoting by :2:*(w) the spectral density of process {X*t}, the density of a linear

combination with coefficients v will be v' :2:*(w) v . By hypothesis, there are k linearly independent v such that process {v'X*t} has one real unit root. As known, this means that the equation

x' :2:*(0) x

=

O , xeRn+I, x;t:O,

admits k non trivial, linear1y independent solutions. This implies that O is an eigenvalue of

muhiplicity k of :2:*(0) and that, consequently, the k terms of lowest degree of the characteristic

polynomialof :2:*(0) are zero. The k equations shown are simply the coefficients ofthese terms

(16)

Proposition: Let {X*t=( X*\t, X*2t, ... , X*o+l.t)'} in S - S', with cross covariance matrices í~*

be given. For each E> O we sha11 produce a rather special element of S', whose spectral density at

the zero frequency has rank n, and which will be closer than E to {X*,} . Define matrix !:*(O;

XI2), which has a11 its entries equal to !:*(O) but for co-spectrum Cl2 which is left as an unknown

X\2, and consider the n+ I-th degree algebraic equation:

det !:*(O; X\2) =

o

(A. 1)

Let c' 12 be a real root of (A. 1) . Now set Õ = c' \2 - C\2 and let m be the smallest positive integer

such that I Õ 11t/m ~ E . Consider a sequence of cross-covariance matrices {í'~} such that, for

every 't, they are equal to í ~. but for the following exceptions:

for 0< I 't I ~m, the (1,2) and (2,1) entries ofrnatrix í'~ equal, respectively,

cov (X*\t, X*2.t-~) + o1t/m cov (X*2t, X*I.t_~) + o1t/m

By the spectral representation theorem there exists a stationary process which admits this

sequence of covariance matrices. By construction, the distance between this process and {Xt_{*} is}

smaller than E and its spectral matrix at the zero frequency is equal to !:*(O) but for the (1,2) and

. (2,1) entries, whose value is now:

C\2 + (1/21t) x (o1t/m) x (2m) = C\2 + o

=

c' \2

This matrix has then at least one zero eigenvalue and the result is proved.

Theorem: Suppose that i) does not apply. Then {~} is a member of S. Given 00 and k=I, by

the Proposition one can find a process {Xt*I} satisfying ii). If k>1 it is easy to aefapt the

technique in the proof ofthe Proposition to change exactly k co-spectra.

(17)

Anderson, T. W. 1971. The Statistical Analysis ofTime Series. 1. Wiley and Sons

Inc., New York.

Flôres, R. G. Jr and A. Szafarz. 199. Efficient markets do not cointegrate.

Cahiers du CERO (Numéro en hommage à Simone Huyberechts), 36;

143-51.

Hamilton, 1. D. 1994. Time Series Analysis. Princeton University Press,

Princeton.

Harvey, A. 1997. Trends, cycles and autoregressions. The Economic Journal,

107, 192-201.

Johansen, S. 1988. Statistical analysis of cointegration vectors. J. of Economic

Dynamics and Control, 12, 231-54.

Johansen, S. 1991. Estimation and hypothesis testing of cointegration vectors in

Gaussian vector autoregressive models. Econometrica, 59, 1551-80.

Phillips, P. C. B. and S. Ouliaris. 1988. Testing for cointegration using principal

components methods. J. of Economic Dynamics and Control, 12, 205-30.

Shilov, G. E. 1961. An Introduction to lhe Theory ofLinear Spaces.

Prentice-Hall Inc., New York.

(18)

11-N.Cham. PIEPGE SPE F634t

Autor: Flôres Junior, Renato Ga!vão.

Título: To cointegrate or not to cointegrate? That's a topo!

100639

1111111111111111111111111111111111111111 51507

FGV -BMHS N" Pat.:FI57/98

000100639