• Nenhum resultado encontrado

Recurrence and Transience

No documento Inference in Hidden Markov Models (páginas 161-170)

7.2 Chains on General State Spaces

7.2.2 Recurrence and Transience

In view of the discussion above, it is not sensible to define recurrence and transience in terms of the expectation of the occupation measure of a state, but for phi-irreducible chains it makes sense to consider the occupation measure of accessible sets.

Definition 150 (Uniform Transience and Recurrence). A set A ∈ X is called uniformly transient if supx∈AExA] < ∞. A set A ∈ X is called recurrent if ExA] = +∞for all x∈A.

Obviously, if supx∈XExA] < ∞, then A is uniformly transient. In fact the reverse implication holds true too, because if the chain is started outsideAit cannot

hitAmore times, on average, than if it is started at “the most favorable location” in A. Thus an alternative definition of a uniformly transient set is supx∈XExA]<∞.

The main result on phi-irreducible transition kernels is the following recur-rence/transience dichotomy, which parallels Theorem 139 for countable state-space Markov chains.

Theorem 151. Let Q be a phi-irreducible transition kernel (or Markov chain).

Then either of the following two statements holds true.

(i) Every accessible set is recurrent, in which case we callQ recurrent.

(ii) There is a countable cover ofXwith uniformly transient sets, in which case we call Qtransient.

In the next section, we will prove Theorem 151 in the particular case where the chain possesses anaccessible atom(see Definition 152); the proof is then very similar to that for countable state space. In the general case, the proof is more involved. It is necessary to introduce small sets and the so-called splitting construction, which relates the chain to one that does possess an accessible atom.

Transience and Recurrence for Chains Possessing an Accessible Atom Definition 152(Atom). A setα∈ X is called anatomif there exists a probability measureν on(X,X)such that Q(x, A) =ν(A)for allx∈αandA∈ X.

Atoms behave the same way as do individual states in the countable state space case. Although any singleton{x}is an atom, it is not necessarily accessible, so that Markov chain theory on general state spaces differs from the theory of countable state space chains.

If αis an atom forQ, then for anym≥1 it is an atom for Qm. Therefore we denote by Qm(α,·) the common value ofQm(x,·) for all x∈α. This implies that if the chain starts from within the atom, the distribution of the whole chain does not depend on the precise starting point. Therefore we will also use the notation Pα instead of Px for anyx∈α.

Example 153(Random Walk on the Half-Line). The random walk on the half-line (RWHL) is defined by an initial conditionX0≥0 and the recursion

Xk+1= (Xk+Wk+1)+, k≥0, (7.22) where{Wk}k≥1is an i.i.d. sequence of random variables, independent ofX0, with distribution function Γ onR. This process is a Markov chain with transition kernel Qdefined by

Q(x, A) = Γ(A−x) + Γ((−∞,−x])1A(0), x∈R+, A∈ B(R+), whereA−x={y−x: y∈A}. The set{0} is an atom, and it is accessible if and only if Γ((−∞,0])>0.

We now prove Theorem 151 when there exists an accessible atom.

Proposition 154. Let {Xk}k≥0 be a Markov chain that possesses an accessible atom α, with associated probability measureν. Then the chain is phi-irreducible,ν is an irreducibility measure, and a set A∈ X is accessible if and only if PαA<

∞)>0.

Moreover, α is recurrent if and only if Pαα<∞) = 1 and (uniformly) tran-sient otherwise, and the chain is recurrent ifαis recurrent and transient otherwise.

Proof. For allA∈ X andx∈X, the strong Markov property yields PxA<∞)≥PxαA◦θτα<∞, τα<∞)

= Ex[PXταA<∞)1α<∞}]

= PαA<∞) Pxα<∞)

≥ν(A) Pxα<∞).

Because α is accessible, Pxα < ∞) > 0 for all x ∈ X. Thus for any A ∈ X satisfyingν(A)>0, it holds that PxA<∞)>0 for all x∈X, showing thatν is an irreducibility measure. The above display also shows that Ais accessible if and only if PαA<∞).

Now letσ(n)α be the successive hitting times ofα(see (7.13)). The strong Markov property implies that for any n >1,

Pα(n)α <∞) = Pαα<∞) Pα(n−1)α <∞).

Hence, as for discrete state spaces, Pαα(n)<∞) = [Pαα<∞)]n−1and Eαα] = 1/[1−Pαα<∞)]. This proves thatαis recurrent if and only if Pαα<∞) = 1.

Assume thatα is recurrent. Because the atom αis accessible, for any x∈X, there existsrsuch thatQr(x, α)>0. IfA∈ X+there existsssuch thatQs(α, A)>

0. By the Chapman-Kolmogorov equations,

X

n≥1

Qr+s+n(x, A)≥Qr(x, α)

 X

n≥1

Qn(α, α)

Qs(α, A) =∞.

Hence ExA] =∞ for all x∈ Xand A is recurrent. Because A was an arbitrary accessible set, the chain is recurrent.

Assume now thatαis transient, in which case Eαα)<∞. Then, following the same line of reasoning as in the discrete state space case (proof of Proposition 136), we obtain that for allx∈X,

Exα] = Pxα<∞) Eαα]≤Eαα]. (7.23) DefineBj ={x:Pj

n=1Qn(x, α)≥1/j}. Then∪j=1Bj=Xbecauseαis accessible.

Applying the definition of the setsBj and the Chapman-Kolmogorov equations, we find that

X

k=1

Qk(x, Bj)≤

X

k=1

Qk(x, Bj) inf

y∈Bj

j

j

X

`=1

Q`(y, α)

≤j

X

k=1 j

X

`=1

Z

Bj

Qk(x, dy)Q`(y, α)≤j2

X

k=1

Qk(x, α) =j2Exα]<∞.

The setsBj are thus uniformly transient. The proof is complete.

Small Sets and the Splitting Construction

We now return to the general phi-irreducible case. In order to prove Theorem 151, we need to introduce the splitting technique. To do so, we need to define a class of sets (containing accessible sets) that behave the same way in many respects as do atoms. We shall see this in many of the results below, which exactly mimic the atomic case results they generalize. These sets are called small sets.

Definition 155 (Small Set). Let Q andν be a transition kernel and a probability measure, respectively, on (X,X), let m be a positive integer and ∈ (0,1]. A set C∈ X is called a (m, , ν)-small setforQ, or simply a small set, if ν(C)>0 and for allx∈C andA∈ X,

Qm(x, A)≥ν(A). If = 1 thenC is an atom for the kernelQm.

Trivially, any individual point is a small set, but small sets that are not accessible are of limited interest. If the state space is countable and Q is irreducible, then every finite set is small. The minorization measure associated to an accessible small set provides an irreducibility measure.

Proposition 156. LetCbe an accessible(m, , ν)-small set for the transition kernel Qon (X,X). Thenν is an irreducibility measure.

Proof. LetA∈ X be such thatν(A)>0. The strong Markov property yields PxA<∞)≥PxC <∞, τA◦θτC <∞) = Ex[1C<∞}PXτCA<∞)]. BecauseC is a small set, for ally∈Cit holds that

PyA<∞)≥Py(Xm∈A) =Qm(y, A)≥ν(A). BecauseC is accessible andν(A)>0, for all x∈Xit holds that

PxA<∞)≥ν(A) PxC<∞)>0. ThusAis accessible, whenceν is an irreducibility measure.

An important result due to Jain and Jamison (1967) states that if the transition kernel is phi-irreducible, then small sets do exist. For a proof see Nummelin (1984, p. 16) or Meyn and Tweedie (1993, Theorem 5.2.2).

Proposition 157. If the transition kernelQon(X,X)is phi-irreducible, then every accessible set contains an accessible small set.

Given the existence of just one small set from Proposition 157, we may show that it is possible to coverXwith a countable number of small sets in the phi-irreducible case.

Proposition 158. Let Qbe a phi-irreducible transition kernel on(X,X).

(i) IfC ∈ X is an (m, , ν)-small set and for any x∈ D we have Qn(x, C)≥δ, thenD is(m+n, δ, ν)-small set.

(ii) If Qis phi-irreducible then there exists a countable collection of small sets Ci

such that X=S

iCi.

Proof. Using the Chapman-Kolmogorov equations, we find that for anyx∈D, Qn+m(x, A)≥

Z

C

Qn(x, dy)Qm(y, A)≥Qn(x, C)ν(A)≥δν(A),

showing part (i). BecauseQ is phi-irreducible, by Proposition 157 there exists an accessible (m, , ν)-small set C. Moreover, by the definition of phi-irreducibility, the setsC(n, m) ={x:Qn(x, C)≥1/m}coverXand, by part (i), eachC(n, m) is small.

Proposition 159. IfQis phi-irreducible and transient, then every accessible small set is uniformly transient.

Proof. Let C be an accessible (m, , ν)-small set. If Qis transient, there exists at least one A ∈ X+ that is uniformly transient. For δ ∈ (0,1), by the Chapman-Kolmogorov equations,

ExA] =

X

k=0

Qk(x, A)≥(1−δ)

X

p=0

δp

X

k=0

Qk+m+p(x, A)

≥(1−δ)

X

p=0

δp

X

k=0

Z

C

Qk(x, dx0) Z

Qm(x0, dx00)Qp(x00, A)

X

k=0

Qk(x, C)×(1−δ)

X

p=0

δpνQp(A) =ExC]νKδ(A),

where Kδ is the resolvent kernel (7.17). Because C is an accessible small set, Proposition 156 shows thatνis an irreducibility measure. By Theorem 147,νKδis a maximal irreducibility measure, so thatνKδ(A)>0. Thus supx∈XExC]<∞and we conclude thatCis uniformly transient (see the remark following Definition 150).

Example 160 (Autoregressive Process, Continued). Suppose that the noise distri-bution in Example 148 has an everywhere positive continuous densityγwith respect to Lebesgue measure λLeb. If C = [−M, M] and = inf|x|≤(1+φ)Mγ(u), then for A⊆C,

Q(x, A) = Z

A

γ(x0−φx)dx0 ≥λLeb(A).

Hence the compact setCis small. ObviouslyRis covered by a countable collection of small sets and every accessible set (here sets with non-zero Lebesgue measure) contains a small set.

Example 161 (Metropolis-Hastings Algorithm, Continued). Similar results hold for the Metropolis-Hastings algorithm of Example 149 ifπ(x) andr(x, x0) are pos-itive and continuous for all (x, x0) ∈ X×X. Suppose that C is compact with λLeb(C)>0. By positivity and continuity, we then haved= supx∈Cπ(x)<∞and ε= inf(x,x0)∈C×Cq(x, x0)>0. For anyA⊆C, define

Rx(A)def=

x0∈A: π(x0)q(x0, x) π(x)q(x, x0) <1

, the region of possible rejection. Then for anyx∈C,

Q(x, A)≥ Z

A

q(x, x0)α(x, x0)dx0

≥ Z

Rx(A)

q(x0, x)

π(x) π(x0)dx0+ Z

A\Rx(A)

q(x, x0)dx0

≥ ε d

Z

Rx(A)

π(x0)dx0+ε d

Z

A\Rx(A)

π(x0)dx0

= ε d

Z

A

π(x0)dx0 .

Thus C is small and, again, X can be covered by a countable collection of small sets.

We now show that it is possible to define a Markov chain with an atom, the so-called split chain, whose properties are directly related to those of the original chain. This technique was introduced by Nummelin (1978) (Athreya and Ney,

1978, introduced, independently, a virtually identical concept) and allows extending results valid for Markov chain possessing an accessible atom to irreducible Markov chains that only possess small sets. The basic idea is as follows. Suppose the chain admits a (1, , ν)-small set C. Then as long as the chain does not enter C, the transition kernel Q is used to generate the trajectory. However, as soon as the chain hitsC, sayXn∈C, a zero-one random variabledn is drawn, independent of everything else. The probability thatdn= 1 is, and hencedn= 0 with probability 1−. Then if dn = 1, the next value Xn+1 is drawn from ν; otherwise Xn+1 is drawn from the kernel

R(x, A) = [1−1C(x)]−1[Q(x, A)−1C(x)ν(A)],

withx=Xn. It is immediate that ν(A) + (1−)R(x, A) =Q(x, A) for allx∈C, soXn+1 is indeed drawn from the correct (conditional) distribution. Note also that R(x,·) =Q(x,·) forx6∈C. So, what is gained by this approach? What is gained is that wheneverXn∈C anddn = 1, the next value of the chain will be independent of Xn (because it is drawn from ν). This is often called a regeneration time, as the joint chain{(Xk, dk)}in a sense “restarts” and forgets its history. In technical terms, the state C× {1} in the extended state space is as atom, and it will be accessible providedC is.

We now make this formal. Thus we define the so-called extended state space as ˇX = X× {0,1} and let ˇX be the associated product σ-field. We associate to every measure µ on (X,X) the split measure µ? on (ˇX,Xˇ) as the unique measure satisfying, forA∈ X,

µ?(A× {0}) = (1−)µ(A∩C) +µ(A∩Cc), µ?(A× {1}) =µ(A∩C).

IfQis a transition kernel on (X,X), we define the kernelQ?onX×XˇbyQ?(x,A) =ˇ [Q(x,·)]?( ˇA) forx∈Xand ˇA∈Xˇ.

Assume now thatQis a phi-irreducible transition kernel and letCbe a (1, , ν)-small set. We define the split transition kernel ˇQ on ˇX×Xˇ as follows. For any x∈Xand ˇA∈Xˇ,

Q((x,ˇ 0),A) =ˇ R?(x,A)ˇ , (7.24) Q((x,ˇ 1),A) =ˇ ν?( ˇA). (7.25) Examining the above technicalities, we find that transitions intoCc× {1} have zero probability from everywhere, so thatdn = 1 can only occur ifXn ∈C. Because dn = 1 indicates a regeneration time, from within C, this is logical. Likewise we find that given a transition to somey∈C, the conditional probability thatdn = 1 is, wherever the transition took place from. Thus the above split transition kernel corresponds to the following simulation scheme for{(Xk, dk)}. Assume (Xk, dk) are given. IfXk 6∈C, then drawXk+1fromQ(Xk,·). IfXk ∈Canddn = 1, then draw Xk+1 from ν, otherwise from R(Xk,·). If the realized Xk+1 is not inC, then set dk+1 = 0; ifXk+1 is in C, then set dk+1 = 1 with probability, and otherwise set dk+1= 0.

Split measures operate on the split kernel in the following way. For any measure µon (X,X),

µ?Qˇ= (µQ)?. (7.26)

For any probability measure ˇµ on ˇX, we denote by ˇPµˇ and ˇEµˇ, respectively, the probability distribution and the expectation on the canonical space (ˇXN,XˇN) such that the coordinate process, denoted{(Xk, dk)}k≥0, is a Markov chain with initial

probability measure ˇµ and transition kernel ˇQ. We also denote by {Fˇk}k≥0 the natural filtration of this chain and, as usual, by {FkX}k≥0 the natural filtration of {Xk}k≥0.

Proposition 162. Let Q be a phi-irreducible transition kernel on(X,X), letC be an accessible (1, , ν)-small set forQ and letµ be a probability measure on (X,X).

Then for any bounded X-measurable functionf and any k≥1,

µ?[f(Xk)| Fk−1X ] =Qf(Xk−1) Pˇµ?-a.s. (7.27) Before giving the proof, we discuss the implications of this result. It implies that under ˇPµ?,{Xk}k≥0 is a Markov chain (with respect to its natural filtration) with transition kernelQand initial distributionµ. By abuse of notation, we can identify {Xk} with the coordinate process associated to the canonical space XN. Denote by Pµ the probability measure on (XN,XN) such that{Xk}k≥0is a Markov chain with transition kernelQand initial distributionµ(see Section 1.1.2) and denote by Eµ the associated expectation operator. Then Proposition 162 yields the following identity. For any boundedFX-measurable random variableY,

µ?[Y] = Eµ[Y]. (7.28) of Proposition 162. We have, µ?-a.s.,

µ?[f(Xk)|Fˇk−1] =1{dk−1=1}ν(f) +1{dk−1=0}Rf(Xk−1). Because ˇPµˇ(dk−1= 1| Fk−1X ) =1C(Xk−1) ˇPµ?-a.s., it holds that

µ?[f(Xk)| Fk−1X ] = Eˇµ?{E[fˇ (Xk)|Fˇk−1]| Fk−1X }

= 1C(Xk−1)ν(f) + [1−1C(Xk−1)]Rf(Xk−1)

= Qf(Xk−1).

Corollary 163. Under the assumptions of Proposition 162,X×{1}is an accessible atom and ν? is an irreducibility measure for the split kernel Q. More generally, ifˇ B ∈ X is accessible forQ, thenB× {0,1}is accessible for the split kernel.

Proof. Because ˇα=X×{1}is an atom for the split kernel ˇQ, Proposition 154 shows thatν?is an irreducibility measure if ˇαis accessible. Applying (7.28) we obtain for x∈X,

(x,1)αˇ<∞) = ˇP(x,1)(dn= 1 for somen≥1)

≥Pˇ(x,1)(d1= 1) =ν(C)>0,

(x,0)αˇ<∞) = ˇP(x,0)((Xn, dn)∈C× {1} for somen≥1)

≥Pˇ(x,0)C×{0,1}<∞, dτC×{0,1}= 1) =PxC<∞)>0. Thus ˇα is accessible and ν? is an irreducibility measure for ˇQ. This implies, by Theorem 147, that for all η ∈ (0,1), ν?η is a maximal irreducibility measure for the split kernel ˇQ; here Kη is the resolvent kernel (7.17) associated to ˇQ. By straightforward applications of the definitions, it is easy to check that ν?η = (νKη)?. Moreover, ν is an irreducibility measure for Q, and νKη is a maximal irreducibility measure for Q (still by Proposition 156 and Theorem 147). IfB is accessible, thenνKη(B)>0 and

ν?η(B× {0,1}) = (νKη)?(B× {0,1}) =νKη(B)>0.

ThusB× {0,1}is accessible for ˇQ.

Transience/Recurrence Dichotomy for General Phi-irreducible Chains Using the splitting construction, we are now able to prove Theorem 151 for chains not possessing accessible atoms. We first consider the simple case in which the chain possesses a 1-small set.

Proposition 164. Let Qbe a phi-irreducible transition kernel that admits an ac-cessible(1, , ν)-small setC. ThenQis either recurrent or transient. It is recurrent if and only if the small setC is recurrent.

Proof. Because the split chain possesses an accessible atom, by Proposition 154 the split chain is phi-irreducible and either recurrent or transient. Applying (7.28) we can write

δ?xB×{0,1}] = ExB]. (7.29) Assume first that the split chain is recurrent. LetB be an accessible set forQ. By Proposition 162,B×{0,1}is accessible for the split chain. Hence ˇEδ?xB×{0,1}] =∞ for allx∈B, so that, by (7.29), ExB] =∞for allx∈B.

Conversely, if the split chain is transient, then by Proposition 154 the atom ˇα is transient. For j ≥1, define Bj ={x: Pj

l=1l((x,0),α)ˇ ≥1/j}. Because ˇαis accessible,∪j=1Bj =X. By the same argument as in the proof of Proposition 154, the sets Bj× {0,1} are uniformly transient for the split chain. Hence, by (7.29), the setsBj are uniformly transient forQ.

It remains to prove that if the small setCis recurrent, then the chain is recurrent.

We have just proved that Q is recurrent if and only if ˇQ is recurrent and, by Proposition 154, this is true if and only if the atom ˇαis recurrent. Thus we only need to prove that if C is recurrent then ˇα is recurrent. If C is recurrent, then (7.29) yields for allx∈C,

δ?xαˇ]≥Eˇδ?xC×{0,1}] =ExC] =∞.

Using the definition ofδ?x, this implies that there exists ˇx∈Xˇ such that ˇEˇxαˇ] =

∞. This observation and (7.23) imply that ˇEαˇαˇ] = ∞, that is, the atom is recurrent.

Using the resolvent kernel, the previous results can be extended to the general case where an accessible small set exists, but not necessarily a 1-small one.

Proposition 165. Let Qbe transition kernel.

(i) IfQis phi-irreducible and admits an accessible (m, , ν)-small setC, then for any η ∈ (0,1), C is an accessible (1, 0, ν)-small set for the resolvent kernel Kη= (1−η)P

k=0ηkQk with 0= (1−η)ηm.

(ii) A set is recurrent (resp. uniformly transient) forQif and only if it is recurrent (resp. uniformly transient) for Kη for some (hence for all) η∈(0,1).

(iii) Qis recurrent (resp. transient) if and only ifKη is recurrent (resp. transient) for some (hence for all) η∈(0,1).

Proof. For anyη >0,x∈C, andA∈ X,

Kη(x, A)≥(1−η)ηmQm(x, A)≥(1−η)ηmν(A) =0ν(A).

ThusCis a (1, 0, ν)-small set forKη, showing part (i). The remaining claims follow from the identity

X

n≥1

Kηn= 1−η η

X

n≥0

Qn.

Harris Recurrence

As for countable state spaces, it is sometimes useful to consider stronger recurrence properties, expressed in terms of return probabilities rather than mean occupation times.

Definition 166(Harris Recurrence). A setA∈ X is said to be Harris recurrentif PxA<∞) = 1for anyx∈X. A phi-irreducible Markov chain is said to beHarris (recurrent) if any accessible set is Harris recurrent.

It is intuitively obvious that, as for countable state spaces, Harris recurrence implies recurrence.

Proposition 167. A Harris recurrent set is recurrent.

Proof. Let A be a Harris recurrent set. Because for j ≥1,σ(j+1)AA◦θσA(j) on the set{σA(j)<∞}, the strong Markov property implies that for anyx∈A,

Px(j+1)A <∞) = Ex

PX

σ(j) A

A<∞)1(j)

A <∞}

= PxA(j)<∞). Because PxA(1) <∞) = 1 for x∈A, we obtain that for all x∈A and all j ≥1, PxA(j)= 1) and ExA] =P

j=1Px(j)A <∞) =∞.

Even though all transition kernels may not be Harris recurrent, the following theorem provides a very useful decomposition of the state space of a recurrent phi-irreducible transition kernel. For a proof of this result, see Meyn and Tweedie (1993, Theorem 9.1.5)

Theorem 168. Let Q be a phi-irreducible recurrent transition kernel on a state space Xand let ψ be a maximal irreducibility measure. Then X=N∪H, where N is covered by a countable family of uniformly transient sets, ψ(N) = 0 and every accessible subset of His Harris recurrent.

As a consequence, ifA is an accessible set of a recurrent phi-irreducible chain, then there exists a setA0 ⊆Asuch thatψ(A\A0) = 0 for any maximal irreducibility measure ψ, and PxA0 <∞) = 1 for allx∈A0.

Example 169. To understand why a recurrent Markov chain can fail to be Harris, consider the following elementary example of a chain on X=N. Let the transition kernel Q be given by Q(0,0) = 1 and for x ≥ 1, Q(x, x+ 1) = 1−1/x2 and Q(x,0) = 1/x2. Thus the state 0 is absorbing. Because Q(x,0)>0 for anyx∈X, δ0is an irreducibility measure. In fact, by application of Theorem 147, this measure is maximal. The set {0} is an atom and because P0{0} < ∞) = 1, the chain is recurrent by Proposition 154.

The chain is not Harris recurrent, however. Indeed, for anyx≥1 we have Px0≥k) = Px(X16= 0, . . . , Xk−16= 0) =

x+k−1

Y

j=x

(1−1/j2).

BecauseQ

j=2(1−1/j2)>0, we obtain that Px0=∞) = limk→∞Px0≥k)>0 for anyx≥2, so that the accessible state 0 is not certainly reached from such an initial state. Comparing to Theorem 168, we see that the decomposition of the state space is given byH={0}andN={1,2, . . .}.

No documento Inference in Hidden Markov Models (páginas 161-170)