Derivation of Kalman Filtering and Smoothing Equations
Byron M. Yu
Department of Electrical Engineering Stanford University
Stanford, CA 94305, USA byronyu@stanford.edu
Krishna V. Shenoy
Department of Electrical Engineering Neurosciences Program
Stanford University Stanford, CA 94305, USA
shenoy@stanford.edu
Maneesh Sahani
Gatsby Computational Neuroscience Unit University College London
17 Queen Square, London WC1N 3AR, UK maneesh@gatsby.ucl.ac.uk
October 19, 2004
Abstract
The Kalman filtering and smoothing problems can be solved by a series of forward and backward recursions, as presented in [1]–[3]. Here, we show how to derive these relationships from first principles.
1 Introduction
We consider linear time-invariant dynamical systems (LDS) of the following form:
xt+1 =Axt +wt (1)
yt =Cxt+vt (2)
wherext and yt are the state and output, respectively, at timet. The noise terms, wt and vt, are zero-mean normally-distributed random variables with covariance matrices Qand R, respectively.
The initial state, x1, is normally-distributed with meanπ1 and variance V1.
In this work, we assume that the parameters of the linear dynamical system, namely A, C, Q, R, π1, and V1 are known. Whereas the outputs are observed, the state and noise variables are hidden.
The goal is to determineP(xt|{y}t1) andP(xt|{y}T1) fort= 1, . . . , T. These are the solutions to the filtering and smoothing problems, respectively. Both distributions are normally-distributed for the system described by (1) and (2), so it suffices to find the mean and variance of each distribution.
We will use the same notation as in [1]. E(xt|{y}τ1) is denoted by xτt and Var(xt|{y}τ1) is denoted by Vtτ. The sequence of T outputs (y1,y2, . . . ,yT) is denoted by {y}. A subsequence of outputs (yt0,yt0+1, . . . ,yt1) is denoted by{y}tt10.
2 Forward Recursions: Filtering
By the assumptions of the LDS described by (1) and (2),P(xt|{y}t1) is a normal distribution. We seek its meanxtt and variance Vtt.
logP(xt|{y}t1) = logP(xt|{y}t1−1,yt)
= logP(yt|xt,{y}t1−1) + log P(xt|{y}t1−1) +. . .
= logP(yt|xt) + logP(xt|{y}t1−1) +. . .
=−1
2(yt−Cxt)0R−1(yt−Cxt)−1
2(xt−xtt−1)0(Vtt−1)−1(xt−xtt−1) +. . .
=−1
2x0t C0R−1C+ (Vtt−1)−1
xt+x0t C0R−1yt+ (Vtt−1)−1xtt−1
+. . . (3) Note that, in general, ifz is normally-distributed with meanµand variance Σ,
logP(z) =−1
2(z−µ)0Σ−1(z−µ) +. . .
=−1
2z0Σ−1z+z0(Σ−1µ) +. . . (4) Comparing the first terms in (3) and (4) and using the Matrix Inversion Lemma,
Vtt = C0R−1C+ (Vtt−1)−1−1
=Vtt−1−KtCVtt−1 (5)
where
Kt =Vtt−1C0 R+CVtt−1C0−1
. (6)
To find the time update for the variance, we use the fact that Axt−1 andwt−1 are independent Vtt−1 = Var(Axt−1|{y}t1−1) + Var(wt−1|{y}t1−1)
=AVtt−−11A0+Q. (7)
Before finding the mean of the normal distribution, we derive the following matrix identity (A+B)−1(A+B) =I
I−(A+B)−1A= (A+B)−1B
(I−(A+B)−1A)B−1 = (A+B)−1. (8) Comparing the second terms in (3) and (4) and applying the matrix identity (8),
xtt=Vtt C0R−1yt+ (Vtt−1)−1xtt−1
=Vtt−1C0
I− R+CVtt−1C0−1
CVtt−1C0
R−1yt+ (I−KtC)xtt−1
=Vtt−1C0 R+CVtt−1C0−1
yt+ (I−KtC)xtt−1
=Ktyt+ (I−KtC)xtt−1
=xtt−1+Kt(yt−Cxtt−1). (9)
The time update for the mean can be found by conditioning onxt−1
xtt−1=Ext−1 E(xt|xt−1,{y}t1−1)
{y}t1−1
=Ext
−1 Axt−1
{y}t1−1
=Axtt−−11. (10)
The recursions start withx01 =π1 and V10 =V1. Equations (5), (6), (7), (9), and (10) together form the Kalman filter forward recursions, as shown in [1].
3 Backward Recursions: Smoothing
Like the filtered posterior distributionP(xt|{y}t1), the smoothed posterior distributionP(xt|{y}T1) is also normal. We seek its meanxTt and variance VtT. We are also interested in the covariance of the joint posterior distributionP(xt+1,xt|{y}T1), denotedVt+1,tT .
log P(xt+1,xt|{y}T1) = logP(xt|xt+1,{y}T1) + logP(xt+1|{y}T1)
= logP(xt|xt+1,{y}t1) + log P(xt+1|{y}T1)
= logP(xt+1|xt) + log P(xt|{y}t1)−logP(xt+1|{y}t1) + logP(xt+1|{y}T1)
=−1
2(xt+1−Axt)0Q−1(xt+1−Axt)−1
2(xt−xtt)0(Vtt)−1(xt−xtt) +1
2(xt+1−xtt+1)0(Vt+1t )−1(xt+1−xtt+1)
−1
2(xt+1−xTt+1)0(Vt+1T )−1(xt+1−xTt+1) +. . .
=−1
2x0t+1 Q−1−(Vt+1t )−1+ (Vt+1T )−1 xt+1
−1
2x0t+1(−Q−1A)xt−1
2x0t(−A0Q−1)xt+1
−1
2x0t A0Q−1A+ (Vtt)−1
xt+x0t (Vtt)−1xtt
+. . . (11)
Note that, in general, if [z01 z02]0 is normally-distributed with mean [µ01 µ02]0, then the log density can be expressed in the form
logP(z1,z2) =−1 2
z1−µ1
z2−µ2 0
S11 S12
S21 S22
z1−µ1
z2−µ2
+. . .
=−1
2z01S11z1−1
2z01S12z2−1
2z02S21z1− 1
2z02S22z2+z02(S21µ1+S22µ2) +. . . (12) The covariance of [z01 z02]0 is
Σ11 Σ12
Σ21 Σ22
=
S11 S12
S21 S22
−1
=
F11−1 −F11−1S12S22−1
−S22−1S21F11−1 F22−1
(13)
=
S11−1+S11−1S12F22−1S21S11−1 −F11−1S12S22−1
−S22−1S21F11−1 S22−1+S22−1S21F11−1S12S22−1
, (14)
where
F11=S11−S12S−221S21
F22=S22−S21S−111S12. Comparing the first four terms in (11) and (12), we can write
Vt+1T Vt+1,tT Vt,t+1T VtT
=
Q−1−(Vt+1t )−1+ (Vt+1T )−1 −Q−1A
−A0Q−1 A0Q−1A+ (Vtt)−1 −1
(15) We first simplify two expressions that will appear when inverting the block matrix in (15). First, using the Matrix Inversion Lemma,
S22−1 = A0Q−1A+ (Vtt)−1−1
=Vtt−VttA0(Vt+1t )−1AVtt
=Vtt−JtVt+1t Jt0, (16)
where we define
Jt =VttA0(Vt+1t )−1. (17) Second, applying the matrix identity (8),
S−221S21=−(Vtt−JtVt+1t Jt0)A0Q−1
=−VttA0 I−(Q+AVttA0)−1AVttA0 Q−1
=−VttA0(Q+AVttA0)−1
=−Jt. (18)
Now, we invert the block matrix in (15). Using (16),(18), and the fact that F11−1 =Vt+1T from (13), VtT =S22−1+S22−1S21F11−1S12S22−1
= (Vtt−JtVt+1t Jt0) + (−Jt)Vt+1T (−Jt0)
=Vtt+Jt(Vt+1T −Vt+1t )Jt0 (19) and
Vt+1,tT =−F11−1S12S−221
=Vt+1T Jt0. (20)
Using (17), (19), and (20), we can also derive a recursive formulation for the covariance Vt,tT−1=VtTJt0−1
= Vtt+Jt(Vt+1T −Vt+1t )Jt0 Jt0−1
= Vtt+Jt(Vt+1,tT −AVtt) Jt0−1
=VttJt0−1+Jt(Vt+1,tT −AVtt)Jt0−1. (21)
Using (5), (17), and (20), this recursion is initialized with VT ,TT −1=VTTJT0−1
= (I−KTC)VTT−1JT0−1
= (I−KTC)AVTT−−11. (22)
To find the mean, we compare the last terms in (11) and (12). Using (16), (17), and (18), S21xTt+1+S22xTt = (Vtt)−1xtt
xTt =−S22−1S21xTt+1+S22−1(Vtt)−1xtt
=JtxTt+1+ (I−JtA)xtt
=xtt+Jt(xTt+1−Axtt). (23) Equations (17), (19), (21), (22), and (23) together form the Kalman smoother backward recur- sions, as shown in [1]. Equivalently, (20) can be used in the place of (21) and (22) to reduce the computation required to find the covariance.
References
[1] Z. Ghahramani and G.E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2, University of Toronto, 1996.
[2] R.H. Shumway and D.S. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 3(4):253–264, 1982.
[3] B.D.O. Anderson and J.B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ, 1979.