7.3 Applications to Hidden Markov Models
7.3.3 Recurrence and Positive Recurrence
As the following result shows, recurrence and transience of the joint chain follows directly from the corresponding properties of the hidden chain.
Proposition 208. Assume that the hidden chain is phi-irreducible. Then the fol-lowing statements hold true.
(i) The joint chain is transient (recurrent) if and only if the hidden chain is tran-sient (recurrent).
(ii) The joint chain is positive if and only if the hidden chain is positive. In addi-tion, if the hidden chain is positive with stationary distributionπ, thenπ⊗G is the stationary distribution of the joint chain.
Proof. First assume that the transition kernelQis transient, that is, that there is a countable coverX=∪iAi ofXwith uniformly transient sets,
sup
x∈Ai
Ex
"∞ X
n=1
1Ai(Xn)
#
<∞.
Then the sets {Ai×Y}i≥1 form a countable cover of X×Y, and these sets are uniformly transient because
Ex
"∞ X
n=1
1Ai×Y(Xn, Yn)
#
= Ex
"∞ X
n=1
1Ai(Xn)
#
. (7.70)
Thus the joint chain is transient.
Conversely, assume that the joint chain is transient. Because the hidden chain is phi-irreducible, Proposition 158 shows that there is a countable coverX=∪iAi
of X with sets that are small for Q. At least one of these, say A1, is accessible for Q. By Lemma 205, the sets Ai×Y are small. By Proposition 202, A1×Y is accessible and, because T is transient, Proposition 159 shows that A1×Y is uniformly transient. Equation (7.70) then shows that A1 is uniformly transient, and because A1 is accessible, we conclude thatQis transient.
Thus the hidden chain is transient if and only if the joint chain is so. The transience/recurrence dichotomy (Theorem 151) then implies that the hidden chain is recurrent if and only if the joint chain is so, which completes the proof of (i).
We now turn to (ii). First assume that the hidden chain is positive recurrent, that is, that there exists a unique stationary probability measureπsatisfyingπQ= π. Then the probability measureπ⊗Gis stationary for the transition kernelT of the joint chain, because
(π⊗G)T(A) = Z
· · · Z
π(dx)G(x, dy)Q(x, dx0)G(x0, dy0)1A(x0, y0)
= Z Z Z
π(dx)Q(x, dx0)G(x0, dy0)1A(x0, y0)
= Z Z
π(dx0)G(x0, dy0)1A(x0, y0) =π⊗G(A).
Because the joint chain admits a stationary distribution it is positive, and by Propo-sition 179 it is recurrent.
Conversely, assume that the joint chain is positive. Denote by ¯π the (unique) stationary probability measure ofT. Thus for any ¯A∈ X ⊗ Y, we have
Z Z
¯
π(dx, dy)Q(x, dx0)G(x0, dy0)1A¯(x0, y0)
= Z Z
¯
π(dx,Y)Q(x, dx0)G(x0, dy0)1A¯(x0, y0) = ¯π( ¯A).
Setting ¯A=A×YforA∈ X, this display implies that Z
¯
π(dx,Y)Q(x, A) = ¯π(A×Y).
This shows thatπ(A) = ¯π(A×Y) is a stationary distribution for the hidden chain.
Hence the hidden chain is positive and recurrent.
When the joint (or hidden) chain is positive, it is natural to study the rate at which it converges to stationarity.
Proposition 209. Assume that the hidden chain satisfies a uniform Doeblin con-dition, that is, there exists a positive integerm, >0and a family {ηx,x0,(x, x0)∈ X×X} of probability measures such that
Qm(x, A)∧Qm(x0, A)≥ηx,x0(A), A∈ X, (x, x0)∈X×X.
Then the joint chain also satisfies a uniform Doeblin condition. Indeed, for all(x, y) and(x0, y0)in X×Y and allA¯∈ X ⊗ Y,
Tm[(x, y),A]¯ ∧Tm[(x0, y0),A]¯ ≥¯ηx,x0( ¯A), where
¯
ηx,x0( ¯A) = Z
ηx,x0(dx)G(x, dy)1A¯(x, y).
The proof is along the same lines as the proof of Lemma 205 and is omitted.
This proposition in particular implies that the ergodicity coefficients for the kernels TmandQmcoincide; δ(Tm) =δ(Qm). A straightforward but useful application of this result is when the hidden Markov chain is defined on a finite state space. If the transition matrixQof this chain is primitive, that is, there exists a positive integer m such that Qm(x, x0) > 0 for all (x, x0) ∈ X×X (or, equivalently, if the chain Q is irreducible and aperiodic), then the joint Markov chain satisfies a uniform Doeblin condition and the ergodicity coefficient of the joint chain is bounded as δ(Tm)≤1−with
= inf
(x,x0)∈X×X sup
x00∈X
[Qm(x, x00)∧Qm(x0, x00)].
A similar result holds when the hidden chain satisfies a Foster-Lyapunov drift condition instead of a uniform Doeblin condition. This result is of particular interest when dealing with hidden Markov models on state spaces that are not finite or bounded.
Proposition 210. Assume that Q is phi-irreducible, aperiodic, and satisfies a Foster-Lyapunov drift condition (Definition 191) with drift function V outside a setC. Then the transition kernelT also satisfies a Foster-Lyapunov drift condition with drift function V outside the setC×Y,
T[(x, y), V]≤λV(x) +b1C×Y(x, y).
Here on the left-hand side, we wrote V also for a function on X×Y defined by V(x, y) =V(x).
The proof is straightforward. Proposition 195 yields an explicit bound on the rate of convergence of the iterates of the Markov chain to the stationary distribution.
This result has a lot of interesting consequences.
Proposition 211. Suppose that Q is phi-irreducible, aperiodic, and satisfies a Foster-Lyapunov drift condition with drift functionV outside a small setC. Then the transition kernel T is positive and aperiodic with invariant distributionπ⊗G, whereπis the invariant distribution ofQ. In addition, for any measurable function f :X×Y→R, the following statements hold true.
(i) If supx∈X[V(x)]−1R
G(x, dy)|f(x, y)| < ∞, then there exist ρ ∈ (0,1) and K <∞ (not depending onf) such that for anyn≥0 and(x, y)∈X×Y,
|Tnf(x, y)−π⊗G(f)| ≤KρnV(x) sup
x0∈X
[V(x0)]−1 intG(x0, dy)|f(x0, y)|. (ii) If supx∈X[V(x)]−1R
G(x, dy)f2(x, y)< ∞, then Eπ⊗G[f2(X0, Y0)] <∞ and there existρ∈(0,1)andK <∞(not depending onf) such that for anyn≥0,
|Covπ[f(Xn, Yn), f(X0, Y0)]|
≤Kρnπ(V)
sup
x∈X
[V(x)]−1/2 Z
G(x, dy)|f(x, y)|
2 . Proof. First note that
|Tnf(x, y)−π⊗G(f)|=
Z Z
[Qn(x, dx0)−π(dx0)]G(x0, dy0)f(x0, y0)
≤ kQn(x,·)−πkV sup
x0∈X
[V(x0)]−1 Z
G(x0, dy)|f(x0, y)|. Now part (i) follows from the geometric ergodicity of Q (Theorem 194). Next, because π(V)<∞,
Eπ⊗G[f2(X0, Y0)] = Z Z
π(dx)G(x, dy)f2(x, y)
≤ π(V) sup
x∈X
[V(x)]−1 Z
G(x, dy)f2(x, y)<∞, implying that |Covπ[|f(Xn, Yn)|,|f(X0, Y0)|]| ≤Varπ[f(X0, Y0)]<∞. In addition
Covπ[f(Xn, Yn), f(X0, Y0)]
= Eπ{E[f(Xn, Yn)−π⊗G(f)| F0]f(X0, Y0)}
= Z Z
π⊗G(dx, dy)f(x, y) Z Z
[Qn(x, dx0)−π(dx0)]G(x0, dy0)f(x0, y0). (7.71) By Jensen’s inequality R
G(x, dy)|f(x, y)| ≤[R
G(x, dy)f2(x, y)]1/2 and QV1/2(x)≤[QV(x)]1/2≤[λV(x) +b1C(x)]1/2≤λ1/2V1/2(x) +b1/21C(x), showing that Q also satisfies a Foster-Lyapunov condition outside C with drift functionV1/2. By Theorem 194, there existsρ∈(0,1) and a constantK such that
Z Z
[Qn(x, dx0)−π(dx)]G(x0, dy0)f(x0, y0)
≤ kQn(x,·)−πkV1/2 sup
x0∈X
V−1/2(x) Z
G(x0, dy)|f(x0, y)|
≤ KρnV1/2(x) sup
x0∈X
V−1/2(x0) Z
G(x0, dy)|f(x0, y)|. Part (ii) follows by plugging this bound into (7.71).
Example 212 (Stochastic Volatility Model, Continued). In the model of Exam-ple 203, we setV(x) = ex2/2δ2 forδ > σU. It is easily shown that
QV(x) = ρ σU exp
x2 2δ2
φ2(ρ2+δ2) δ2
,
whereρ2=σU2δ2/(δ2−σU2). We may chooseδlarge enough thatφ2(ρ2+δ2)/δ2<1.
Then lim sup|x|→∞QV(x)/V(x) = 0 so thatQsatisfies a Foster-Lyapunov condition with drift functionV(x) = ex2/2δ2 outside a compact set [−M,+M]. Because every compact set is small, the assumptions of Proposition 211 are satisfied, showing that the joint chain is positive. Set f(x, y) = |y|. Then R
G(x, dy)|y| = βex/2p 2/π.
Proposition 211(ii) shows that Varπ(Y0)<∞and that the autocovariance function Cov(|Yn|,|Y0|) decreases to zero exponentially fast.
Bibliography
Akashi, H. and Kumamoto, H. (1977) Random sampling approach to state estima-tion in switching environment. Automatica,13, 429–434.
Anderson, B. D. O. and Moore, J. B. (1979)Optimal Filtering. Prentice-Hall.
Askar, M. and Derin, H. (1981) A recursive algorithm for the Bayes solution of the smoothing problem. IEEE Trans. Automat. Control,26, 558–561.
Atar, R. and Zeitouni, O. (1997) Exponential stability for nonlinear filtering. Ann.
Inst. H. Poincar´e Probab. Statist.,33, 697–725.
Athreya, K. B., Doss, H. and Sethuraman, J. (1996) On the convergence of the Markov chain simulation method. Ann. Statist.,24, 69–100.
Athreya, K. B. and Ney, P. (1978) A new approach to the limit theory of recurrent Markov chains. Trans. Am. Math. Soc.,245, 493–501.
Baum, L. E. and Petrie, T. P. (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist.,37, 1554–1563.
Baum, L. E., Petrie, T. P., Soules, G. and Weiss, N. (1970) A maximization tech-nique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist.,41, 164–171.
Bickel, P. J., Ritov, Y. and Ryd´en, T. (1998) Asymptotic normality of the maximum likelihood estimator for general hidden Markov models. Ann. Statist.,26, 1614–
1635.
Boyles, R. (1983) On the convergence of the EM algorithm. J. Roy. Statist. Soc.
Ser. B, 45, 47–50.
Budhiraja, A. and Ocone, D. (1997) Exponential stability of discrete-time filters for bounded observation noise. Systems Control Lett.,30, 185–193.
Campillo, F. and Le Gland, F. (1989) MLE for patially observed diffusions: Direct maximization vs. the EM algorithm. Stoch. Proc. App.,33, 245–274.
Capp´e, O. (2001) Recursive computation of smoothed functionals of hidden Marko-vian processes using a particle approximation. Monte Carlo Methods Appl., 7, 81–92.
Capp´e, O., Buchoux, V. and Moulines, E. (1998) Quasi-Newton method for maxi-mum likelihood estimation of hidden Markov models. InProc. IEEE Int. Conf.
Acoust., Speech, Signal Process., vol. 4, 2265–2268.
Carpenter, J., Clifford, P. and Fearnhead, P. (1999) An improved particle filter for non-linear problems. IEE Proc., Radar Sonar Navigation,146, 2–7.
189
C´erou, F., Le Gland, F. and Newton, N. (2001) Stochastic particle methods for linear tangent filtering equations. In Optimal Control and PDE’s - Innovations and Applications, in Honor of Alain Bensoussan’s 60th Anniversary (eds. J.-L.
Menaldi, E. Rofman and A. Sulem), 231–240. IOS Press.
Chen, R. and Liu, J. S. (2000) Mixture Kalman filter. J. Roy. Statist. Soc. Ser. B, 62, 493–508.
Chigansky, P. and Lipster, R. (2004) Stability of nonlinear filters in nonmixing case.
Ann. Appl. Probab.,14, 2038–2056.
Collings, I. B. and Ryd´en, T. (1998) A new maximum likelihood gradient algorithm for on-line hidden Markov model identification. InProc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 4, 2261–2264.
Cover, T. M. and Thomas, J. A. (1991)Elements of Information Theory. Wiley.
Crisan, D., Del Moral, P. and Lyons, T. (1999) Discrete filtering using branching and interacting particle systems. Markov Process. Related Fields,5, 293–318.
Del Moral, P. (2004)Feynman-Kac Formulae. Genealogical and Interacting Particle Systems with Applications. Springer.
Del Moral, P. and Jacod, J. (2001) Interacting particle filtering with discrete-time observations: Asymptotic behaviour in the Gaussian case. In Stochastics in Fi-nite and InfiFi-nite Dimensions: In Honor of Gopinath Kallianpur (eds. T. Hida, R. L. Karandikar, H. Kunita, B. S. Rajput, S. Watanabe and J. Xiong), 101–122.
Birkh¨auser.
Del Moral, P., Ledoux, M. and Miclo, L. (2003) On contraction properties of Markov kernels. Probab. Theory Related Fields,126, 395–420.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39, 1–38 (with discussion).
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer. URL http://cgm.cs.mcgill.ca/~luc/rnbookindex.html.
Devroye, L. and Klincsek, T. (1981) Average time behavior of distributive sorting algorithms. Computing,26, 1–7.
Dobrushin, R. (1956) Central limit theorem for non-stationary Markov chains. I.
Teor. Veroyatnost. i Primenen.,1, 72–89.
Doob, J. L. (1953)Stochastic Processes. Wiley.
Douc, R., Moulines, E. and Ryd´en, T. (2004) Asymptotic properties of the max-imum likelihood estimator in autoregressive models with Markov regime. Ann.
Statist.,32, 2254–2304.
Doucet, A., De Freitas, N. and Gordon, N. (eds.) (2001) Sequential Monte Carlo Methods in Practice. Springer.
Doucet, A., Godsill, S. and Andrieu, C. (2000) On sequential Monte-Carlo sampling methods for Bayesian filtering. Stat. Comput.,10, 197–208.
Doucet, A. and Tadi´c, V. B. (2003) Parameter estimation in general state-space models using particle methods. Ann. Inst. Statist. Math.,55, 409–422.
Durrett, R. (1996) Probability: Theory and Examples. Duxbury Press, 2nd ed.
Elliott, R. J. and Krishnamurthy, V. (1999) New finite-dimensional filters for param-eter estimation of discrete-time linear Gaussian models. IEEE Trans. Automat.
Control,44.
Ephraim, Y. and Merhav, N. (2002) Hidden Markov processes.IEEE Trans. Inform.
Theory,48, 1518–1569.
Evans, M. and Swartz, T. (1995) Methods for approximating integrals in Statistics with special emphasis on Bayesian integration problems. Statist. Sci., 10, 254–
272.
— (2000) Approximating Integrals via Monte Carlo and Deterministic Methods.
Oxford University Press.
Fearnhead, P. (1998)Sequential Monte Carlo methods in filter theory. Ph.D. thesis, University of Oxford.
Feller, W. (1943) On a general class of “contagious” distributions. Ann. Math.
Statist.,14, 389–399.
Fletcher, R. (1987) Practical Methods of Optimization. Wiley.
Fredkin, D. R. and Rice, J. A. (1992) Maximum-likelihood-estimation and identi-fication directly from single-channel recordings. Proc. Roy. Soc. London Ser. B, 249, 125–132.
Geweke, J. (1989) Bayesian inference in econometric models using Monte-Carlo integration. Econometrica,57, 1317–1339.
Giudici, P., Ryd´en, T. and Vandekerkhove, P. (2000) Likelihood-ratio tests for hidden Markov models. Biometrics,56, 742–747.
Glynn, P. W. and Iglehart, D. (1989) Importance sampling for stochastic simula-tions. Management Science,35, 1367–1392.
Gordon, N., Salmond, D. and Smith, A. F. (1993) Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F, Radar Signal Process., 140, 107–113.
Gupta, N. and Mehra, R. (1974) Computational aspects of maximum likelihood esti-mation and reduction in sensitivity function calculations. IEEE Trans. Automat.
Control,19, 774–783.
Gut, A. (1988)Stopped Random Walks. Springer.
Hammersley, J. M. and Handscomb, D. C. (1965)Monte Carlo Methods. Methuen
& Co.
Handschin, J. (1970) Monte Carlo techniques for prediction and filtering of non-linear stochastic processes. Automatica,6, 555–563.
Handschin, J. and Mayne, D. (1969) Monte Carlo techniques to estimate the condi-tionnal expectation in multi-stage non-linear filtering. In Int. J. Control, vol. 9, 547–559.
Ho, Y. C. and Lee, R. C. K. (1964) A Bayesian approach to problems in stochastic estimation and control. IEEE Trans. Automat. Control,9, 333–339.
Horn, R. A. and Johnson, C. R. (1985) Matrix Analysis. Cambridge University Press.
Ibragimov, I. A. and Hasminskii, R. Z. (1981) Statistical Estimation. Asymptotic Theory. Springer.
Ito, H., Amari, S. I. and Kobayashi, K. (1992) Identifiability of hidden Markov information sources and their minimum degrees of freedom. IEEE Trans. Inform.
Theory,38, 324–333.
Jain, N. and Jamison, B. (1967) Contributions to Doeblin’s theory of Markov pro-cesses. Z. Wahrsch. Verw. Geb., 8, 19–40.
Jamshidian, M. and Jennrich, R. J. (1997) Acceleration of the EM algorithm using quasi-Newton methods. J. Roy. Statist. Soc. Ser. B,59, 569–587.
Jarner, S. and Hansen, E. (2000) Geometric ergodicity of Metropolis algorithms.
Stoch. Proc. App.,85, 341–361.
Julier, S. J. and Uhlmann, J. K. (1997) A new extension of the Kalman filter to nonlinear systems. In AeroSense: The 11th International Symposium on Aerospace/Defense Sensing, Simulation and Controls.
Kaijser, T. (1975) A limit theorem for partially observed Markov chains. Ann.
Probab., 3, 677–696.
Kailath, T., Sayed, A. and Hassibi, B. (2000)Linear Estimation. Prentice-Hall.
Kalman, R. E. and Bucy, R. (1961) New results in linear filtering and prediction theory. J. Basic Eng., Trans. ASME, Series D,83, 95–108.
Kitagawa, G. (1987) Non-Gaussian state space modeling of nonstationary time se-ries. J. Am. Statist. Assoc.,82, 1023–1063.
— (1996) Monte-Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Statist.,1, 1–25.
Kong, A., Liu, J. S. and Wong, W. (1994) Sequential imputation and Bayesian missing data problems. J. Am. Statist. Assoc.,89.
K¨unsch, H. R. (2000) State space and hidden Markov models. InComplex Stochastic Systems (eds. O. E. Barndorff-Nielsen, D. R. Cox and C. Kluppelberg). CRC Press.
— (2003) Recursive Monte-Carlo filters: algorithms and theoretical analysis.
Preprint ETHZ, seminar f¨ur statistics.
Lange, K. (1995) A gradient algorithm locally equivalent to the EM algorithm. J.
Roy. Statist. Soc. Ser. B,57, 425–437.
Le Gland, F. and Mevel, L. (1997) Recursive estimation in HMMs. InProc. IEEE Conf. Decis. Control, 3468–3473.
Le Gland, F. and Oudjane, N. (2004) Stability and uniform approximation of non-linear filters using the hilbert metric and application to particle filters. Ann.
Appl. Probab.,14, 144–187.
Lehmann, E. L. and Casella, G. (1998)Theory of Point Estimation. Springer, 2nd ed.
Leroux, B. G. (1992) Maximum-likelihood estimation for hidden Markov models.
Stoch. Proc. Appl.,40, 127–143.
Lipster, R. S. and Shiryaev, A. N. (2001)Statistics of Random Processes: I. General theory. Springer, 2nd ed.
Liu, J. and Chen, R. (1995) Blind deconvolution via sequential imputations. J. Am.
Statist. Assoc.,430, 567–576.
— (1998) Sequential Monte-Carlo methods for dynamic systems. J. Am. Statist.
Assoc.,93, 1032–1044.
Liu, J. S. (1996) Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Stat. Comput.,6, 113–119.
Louis, T. A. (1982) Finding the observed information matrix when using the EM algorithm. J. Roy. Statist. Soc. Ser. B,44, 226–233.
Luenberger, D. G. (1984)Linear and Nonlinear Programming. Addison-Wesley, 2nd ed.
Meng, X.-L. (1994) On the rate of convergence of the ECM algorithm.Ann. Statist., 22, 326–339.
Meng, X.-L. and Rubin, D. B. (1991) Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. J. Am. Statist. Assoc.,86, 899–909.
— (1993) Maximum likelihood estimation via the ECM algorithm: A general frame-work. Biometrika,80, 267–278.
Meng, X.-L. and Van Dyk, D. (1997) The EM algorithm–an old folk song sung to a fast new tune. J. Roy. Statist. Soc. Ser. B, 59, 511–567.
Mengersen, K. and Tweedie, R. L. (1996) Rates of convergence of the Hastings and Metropolis algorithms. Ann. Statist.,24, 101–121.
Meyn, S. P. and Tweedie, R. L. (1993) Markov Chains and Stochastic Stability.
Springer.
Niederreiter, H. (1992)Random Number Generation and Quasi-Monte Carlo Meth-ods. SIAM.
Nummelin, E. (1978) A splitting technique for Harris recurrent Markov chains. Z.
Wahrscheinlichkeitstheorie und Verw. Gebiete,4, 309–318.
— (1984) General Irreducible Markov Chains and Non-Negative Operators. Cam-bridge University Press.
Orchard, T. and Woodbury, M. A. (1972) A missing information principle: Theory and applications. InProceedings of the 6th Berkeley Symposium on Mathematical Statistics, vol. 1, 697–715.
Ostrowski, A. M. (1966)Solution of Equations and Systems of Equations. Academic Press, 2nd ed.
Pitt, M. K. and Shephard, N. (1999) Filtering via simulation: Auxiliary particle filters. J. Am. Statist. Assoc.,94, 590–599.
Polson, N. G., Carlin, B. P. and Stoffer, D. S. (1992) A Monte Carlo approach to nonnormal and nonlinear state-space modeling. J. Am. Statist. Assoc., 87, 493–500.