PDF Derivation of Kalman Filtering and Smoothing Equations

(1)

Derivation of Kalman Filtering and Smoothing Equations

Byron M. Yu

Department of Electrical Engineering Stanford University

Stanford, CA 94305, USA byronyu@stanford.edu

Krishna V. Shenoy

Department of Electrical Engineering Neurosciences Program

Stanford University Stanford, CA 94305, USA

shenoy@stanford.edu

Maneesh Sahani

Gatsby Computational Neuroscience Unit University College London

17 Queen Square, London WC1N 3AR, UK maneesh@gatsby.ucl.ac.uk

October 19, 2004

Abstract

The Kalman filtering and smoothing problems can be solved by a series of forward and backward recursions, as presented in [1]–[3]. Here, we show how to derive these relationships from first principles.

1 Introduction

We consider linear time-invariant dynamical systems (LDS) of the following form:

xt+1 =Axt +wt (1)

yt =Cxt+vt (2)

wherext and yt are the state and output, respectively, at timet. The noise terms, wt and vt, are zero-mean normally-distributed random variables with covariance matrices Qand R, respectively.

The initial state, x1, is normally-distributed with meanπ1 and variance V1.

In this work, we assume that the parameters of the linear dynamical system, namely A, C, Q, R, π1, and V1 are known. Whereas the outputs are observed, the state and noise variables are hidden.

The goal is to determineP(xt|{y}^t₁) andP(xt|{y}^T₁) fort= 1, . . . , T. These are the solutions to the filtering and smoothing problems, respectively. Both distributions are normally-distributed for the system described by (1) and (2), so it suffices to find the mean and variance of each distribution.

(2)

We will use the same notation as in [1]. E(xt|{y}^τ1) is denoted by x^τ_t and Var(xt|{y}^τ1) is denoted by V_t^τ. The sequence of T outputs (y1,y2, . . . ,yT) is denoted by {y}. A subsequence of outputs (yt0,yt0+1, . . . ,yt1) is denoted by{y}^t_t¹₀.

2 Forward Recursions: Filtering

By the assumptions of the LDS described by (1) and (2),P(xt|{y}^t₁) is a normal distribution. We seek its meanx^t_t and variance V_t^t.

logP(xt|{y}^t₁) = logP(xt|{y}^t₁⁻¹,yt)

= logP(yt|xt,{y}^t₁⁻¹) + log P(xt|{y}^t₁⁻¹) +. . .

= logP(yt|xt) + logP(xt|{y}^t₁⁻¹) +. . .

=−1

2(yt−Cxt)⁰R⁻¹(yt−Cxt)−1

2(xt−x^t_t⁻¹)⁰(V_t^t⁻¹)⁻¹(xt−x^t_t⁻¹) +. . .

=−1

2x⁰_t C⁰R⁻¹C+ (V_t^t⁻¹)⁻¹

xt+x⁰_t C⁰R⁻¹yt+ (V_t^t⁻¹)⁻¹x^t_t⁻¹

+. . . (3) Note that, in general, ifz is normally-distributed with meanµand variance Σ,

logP(z) =−1

2(z−µ)⁰Σ⁻¹(z−µ) +. . .

=−1

2z⁰Σ⁻¹z+z⁰(Σ⁻¹µ) +. . . (4) Comparing the first terms in (3) and (4) and using the Matrix Inversion Lemma,

V_t^t = C⁰R⁻¹C+ (V_t^t⁻¹)⁻¹−1

=V_t^t⁻¹−K_tCV_t^t⁻¹ (5)

where

Kt =V_t^t⁻¹C⁰ R+CV_t^t⁻¹C⁰₋¹

. (6)

To find the time update for the variance, we use the fact that Axt−1 andwt−1 are independent V_t^t⁻¹ = Var(Axt−1|{y}^t₁⁻¹) + Var(wt−1|{y}^t₁⁻¹)

=AV_t^t₋⁻₁¹A⁰+Q. (7)

Before finding the mean of the normal distribution, we derive the following matrix identity (A+B)⁻¹(A+B) =I

I−(A+B)⁻¹A= (A+B)⁻¹B

(I−(A+B)⁻¹A)B⁻¹ = (A+B)⁻¹. (8) Comparing the second terms in (3) and (4) and applying the matrix identity (8),

x^t_t=V_t^t C⁰R⁻¹yt+ (V_t^t⁻¹)⁻¹x^t_t⁻¹

=V_t^t⁻¹C⁰

I− R+CV_t^t⁻¹C⁰−1

CV_t^t⁻¹C⁰

R⁻¹yt+ (I−KtC)x^t_t⁻¹

=V_t^t⁻¹C⁰ R+CV_t^t⁻¹C⁰−1

yt+ (I−KtC)x^t_t⁻¹

=Ktyt+ (I−KtC)x^t_t⁻¹

=x^t_t⁻¹+K_t(yt−Cx^t_t⁻¹). (9)

(3)

The time update for the mean can be found by conditioning onxt−1

x^t_t⁻¹=E_xt−1 E(xt|xt−1,{y}^t₁⁻¹)

{y}^t₁⁻¹

=E_x_t

−1 Ax_t₋1

{y}^t₁⁻¹

=Ax^t_t⁻₋¹₁. (10)

The recursions start withx⁰₁ =π1 and V₁⁰ =V1. Equations (5), (6), (7), (9), and (10) together form the Kalman filter forward recursions, as shown in [1].

3 Backward Recursions: Smoothing

Like the filtered posterior distributionP(xt|{y}^t1), the smoothed posterior distributionP(xt|{y}^T1) is also normal. We seek its meanx^T_t and variance V_t^T. We are also interested in the covariance of the joint posterior distributionP(xt+1,xt|{y}^T₁), denotedV_t+1,t^T .

log P(xt+1,xt|{y}^T1) = logP(xt|xt+1,{y}^T1) + logP(xt+1|{y}^T1)

= logP(xt|xt+1,{y}^t1) + log P(xt+1|{y}^T1)

= logP(xt+1|xt) + log P(xt|{y}^t1)−logP(xt+1|{y}^t1) + logP(xt+1|{y}^T1)

=−1

2(xt+1−Axt)⁰Q⁻¹(xt+1−Axt)−1

2(xt−x^t_t)⁰(V_t^t)⁻¹(xt−x^t_t) +1

2(xt+1−x^t_t+1)⁰(V_t+1^t )⁻¹(xt+1−x^t_t+1)

−1

2(xt+1−x^T_t+1)⁰(V_t+1^T )⁻¹(xt+1−x^T_t+1) +. . .

=−1

2x⁰_t+1 Q⁻¹−(V_t+1^t )⁻¹+ (V_t+1^T )⁻¹ xt+1

−1

2x⁰_t+1(−Q⁻¹A)xt−1

2x⁰_t(−A⁰Q⁻¹)xt+1

−1

2x⁰_t A⁰Q⁻¹A+ (V_t^t)⁻¹

xt+x⁰_t (V_t^t)⁻¹x^t_t

+. . . (11)

Note that, in general, if [z⁰₁ z⁰₂]⁰ is normally-distributed with mean [µ⁰₁ µ⁰₂]⁰, then the log density can be expressed in the form

logP(z1,z2) =−1 2

z1−µ1

z2−µ₂ 0

S11 S12

S21 S22

z1−µ1

z2−µ₂

+. . .

=−1

2z⁰1S11z1−1

2z⁰1S12z2−1

2z⁰2S21z1− 1

2z⁰2S22z2+z⁰2(S21µ1+S22µ2) +. . . (12) The covariance of [z⁰₁ z⁰₂]⁰ is

Σ11 Σ12

Σ21 Σ22

=

S11 S12

S21 S22

−1

=

F₁₁⁻¹ −F₁₁⁻¹S12S₂₂⁻¹

−S₂₂⁻¹S21F₁₁⁻¹ F₂₂⁻¹

(13)

=

S₁₁⁻¹+S₁₁⁻¹S12F₂₂⁻¹S21S₁₁⁻¹ −F₁₁⁻¹S12S₂₂⁻¹

−S₂₂⁻¹S21F₁₁⁻¹ S₂₂⁻¹+S₂₂⁻¹S21F₁₁⁻¹S12S₂₂⁻¹

, (14)

(4)

where

F11=S11−S12S⁻₂₂¹S21

F22=S22−S21S⁻₁₁¹S12. Comparing the first four terms in (11) and (12), we can write

V_t+1^T V_t+1,t^T V_t,t+1^T V_t^T

=

Q⁻¹−(V_t+1^t )⁻¹+ (V_t+1^T )⁻¹ −Q⁻¹A

−A⁰Q⁻¹ A⁰Q⁻¹A+ (V_t^t)⁻¹ ₋¹

(15) We first simplify two expressions that will appear when inverting the block matrix in (15). First, using the Matrix Inversion Lemma,

S₂₂⁻¹ = A⁰Q⁻¹A+ (V_t^t)⁻¹₋1

=V_t^t−V_t^tA⁰(V_t+1^t )⁻¹AV_t^t

=V_t^t−JtV_t+1^t J_t⁰, (16)

where we define

Jt =V_t^tA⁰(V_t+1^t )⁻¹. (17) Second, applying the matrix identity (8),

S⁻₂₂¹S21=−(V_t^t−JtV_t+1^t J_t⁰)A⁰Q⁻¹

=−V_t^tA⁰ I−(Q+AV_t^tA⁰)⁻¹AV_t^tA⁰ Q⁻¹

=−V_t^tA⁰(Q+AV_t^tA⁰)⁻¹

=−Jt. (18)

Now, we invert the block matrix in (15). Using (16),(18), and the fact that F₁₁⁻¹ =V_t+1^T from (13), V_t^T =S₂₂⁻¹+S₂₂⁻¹S21F₁₁⁻¹S12S₂₂⁻¹

= (V_t^t−JtV_t+1^t J_t⁰) + (−Jt)V_t+1^T (−J_t⁰)

=V_t^t+Jt(V_t+1^T −V_t+1^t )J_t⁰ (19) and

V_t+1,t^T =−F₁₁⁻¹S12S⁻₂₂¹

=V_t+1^T J_t⁰. (20)

Using (17), (19), and (20), we can also derive a recursive formulation for the covariance V_t,t^T₋₁=V_t^TJ_t⁰₋₁

= V_t^t+J_t(V_t+1^T −V_t+1^t )J_t⁰ J_t⁰₋₁

= V_t^t+J_t(V_t+1,t^T −AV_t^t) J_t⁰₋₁

=V_t^tJ_t⁰₋₁+Jt(V_t+1,t^T −AV_t^t)J_t⁰₋₁. (21)

Using (5), (17), and (20), this recursion is initialized with V_{T ,T}^T ₋1=V_T^TJ_T⁰₋1

= (I−KTC)V_T^T⁻¹J_T⁰₋₁

= (I−KTC)AV_T^T₋⁻₁¹. (22)

(5)

To find the mean, we compare the last terms in (11) and (12). Using (16), (17), and (18), S21x^T_t+1+S22x^T_t = (V_t^t)⁻¹x^t_t

x^T_t =−S₂₂⁻¹S21x^T_t+1+S₂₂⁻¹(V_t^t)⁻¹x^t_t

=Jtx^T_t+1+ (I−JtA)x^t_t

=x^t_t+Jt(x^T_t+1−Ax^t_t). (23) Equations (17), (19), (21), (22), and (23) together form the Kalman smoother backward recursions, as shown in [1]. Equivalently, (20) can be used in the place of (21) and (22) to reduce the computation required to find the covariance.

References

[1] Z. Ghahramani and G.E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2, University of Toronto, 1996.

[2] R.H. Shumway and D.S. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 3(4):253–264, 1982.

[3] B.D.O. Anderson and J.B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ, 1979.