0

Problem

I have a system that is measured at regular intervals. The state of the system at those times is given by the vector $\vec x=(x_0, x_1, x_2,\cdots)$. In between each measurement, a random variable $z$, drawn from a Gaussian distribution with zero mean and standard deviation $\sigma_z$ is added onto the state and there is some decay, i.e., $x_i=(1-\gamma)x_{i-1}+z_i$. (The underlying model is a damped harmonic oscillator in a thermal environment, probed at multiples of its period.)

My measurements are imperfect, i.e., I measure $\vec y=(y_0,y_1,y_2,\cdots)$, where $y_i=x_i+d_i$. Again, the $d$ are drawn from a Gaussian distribution of zero mean, with standard deviation $\sigma_d$. Note that this setup is a close variant of an example on the Wikipedia page for the Kalman filter. But as far as I understand, the Kalman filter does not take the whole measurement record into account, but provides an update rule for the best guess. In particular, it's estimate for $x_0$, which I am particularly interested in, is bad.

I would like to find $\vec x$ given some measurement record $\vec y$ (parameters $\sigma_d, \sigma_z$, $\gamma$ are known). I believe I should use some maximum likelihood analysis, but I have trouble finding the right probability distributions for $P(\vec y|\vec x), P(\vec x), P(\vec y)$ that I need to determine $P(\vec x|\vec y)$.

Attempt

I tried guessing what the right probability distributions should be. This one is fairly easy: $$P(\vec y|\vec x) = \prod_{i=0}^{N-1}\frac{1}{\sqrt{2\pi\sigma_z^2}}\exp\left(-\frac{(y_i-x_i)^2}{2\sigma_z^2}\right).$$

Now for $P(\vec x)$, I'm already slightly unsure. I start at equilibrium of the above update rule for $x_i$, i.e., when the covariance $\sigma_{x_i}^2\to(1-\gamma)\sigma_{x_{i-1}}^2+\sigma_d^2$ has reached a steady state, i.e., $\sigma_{x_0}=\sigma_d/\sqrt{\gamma}$. This is my expectation for $x_0$. Thus I think I should write $$ P(\vec x) = \frac{1}{\sqrt{2\pi\sigma_{x_0}^2}}\exp\left(-\frac{x_0^2}{2\sigma_{x_0}^2}\right)\times\prod_{i=1}^{N-1}\frac{1}{\sqrt{2\pi\sigma_d^2}}\exp\left(-\frac{(x_i-x_{i-1})^2}{2\sigma_d^2}\right). $$

However, the real problem is with $P(\vec y)$, and I think this is due to my limited experience with this method. Naively, without thinking about Bayesian updates, my guess for $P(\vec x|\vec y)$ would have looked like the product of $P(\vec x)$ and $P(\vec y|\vec x)$. I think my expectation for $\vec y$ should be $$ P(\vec y) = \int d\vec x\, P(\vec x)P(\vec y|\vec x). $$ It is possible to do all the Gaussian integrals, but it doesn't seem to give a simple result.

Is there some simpler assumption I can make? And how is it justified? Is there a mistake somewhere else?

Attempt #2

I'm still not entirely sure about $P(\vec y)$ (I think it's correct though), but one can get by without explicitly calculating it.

One mistake above is calling this approach "maximum likelihood". It's not. MLE would mean that I chose $\vec x$ such that $P(\vec y|\vec x)$ above becomes maximal for a given set of measurement results $\vec y$. This is a bad guess, as it neglects what we know about how the state evolves (note that $\sigma_d$ does not crop up in $P(\vec y|\vec x)$). Instead, what I am doing is Bayesian inference, i.e., finding the conditional probability $P(\vec x|\vec y)$.

To do this, we start with the joint distribution $P(\vec x, \vec y)=P(\vec y|\vec x)P(\vec x)$. This is a normal distribution $$ \begin{pmatrix}\vec x \\ \vec y\end{pmatrix} \sim N\left[\begin{pmatrix}\vec \mu_x \\ \vec \mu_y\end{pmatrix}, \begin{pmatrix}\Sigma_{xx} & \Sigma_{xy} \\ \Sigma_{xy} & \Sigma_{yy}\end{pmatrix}\right],$$ with $\mu_x,\mu_y=0$, $\Sigma_{yy} = \Sigma_{xy} = (1/2\sigma_z^2)\mathbb 1$, and $$ [\Sigma_{xx}]_{ij} = \left( \frac{1}{2\sigma_z^2} + \frac{1}{\sigma_d^2} \right)\delta_{i,j} + \left( \frac{1}{2\sigma_{x_0}^2} - \frac{1}{2\sigma_d^2} \right)\delta_{i,0}\delta_{j,0}+\frac{1}{2\sigma_d^2}(\delta_{i,j+1}+\delta_{i,j-1}).$$ NB: $\Sigma_{xx}$ is tri-diagonal apart from the first element, which differs (indexing from 0).

It's actually straightforward to derive the conditional distribution for $\vec x$ from the joint distribution [2]. It is given through $$ \vec x \sim N(\mu_x+\Sigma_{xy}\Sigma_{yy}^{-1}(\vec y-\vec\mu_y), \Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{xy}). $$ However, in our case, this reduces to $$ \vec x\sim N( \vec y, \Sigma_{xx}-1/2\sigma_z^2 ),$$ which is not at all what I am expecting. All values of $\vec y$ should be used to determine a specific $x_i$. The problem lies in the fact that $\Sigma_{yy}$ is diagonal. What am I missing?

Daniel
  • 51
  • 4
  • 1
    Am I misunderstanding your notation or is what you've called $P(\vec x|\vec y)$ not simply the result of the Kalman smoother? – Chris Haug Aug 25 '19 at 18:53
  • Thanks for your comment, Chris. I'm not sure, I will check it out. If it is it should be derivable as I tried, right? The discussion on Wikipedia is not very clear to me, it would be nice to get some insight by deriving it for this example. – Daniel Aug 26 '19 at 11:36
  • 1
    Yes, but it's easiest to derive recursively rather than all at once, first by computing forward each filtering distribution $p(x_t|y_1,...,y_t)$ from $p(x_{t-1}|y_1,...,y_{t-1})$, and then backwards the state posteriors $p(x_{t-1}|y_1,...,y_T)$ from $p(x_t|y_1,...,y_T)$. To be sure, do you *really* have an observation $y_0$ that corresponds to the state $x_0$ that you care about (as you've written)? Typically the "initial state" is the time right before the first observation, and inference about it is not quite the same. – Chris Haug Aug 26 '19 at 23:56
  • **a)** I've tried to understand the derivation of the Kalman smoother, but in vain. In the end, I need an analytical formula for $x_0$, which is why I was trying to derive it myself. **b)** I have added a second attempt to solve the problem, which seems to almost get me there, but I must be missing something small. Maybe you can spot it? **c)** And yes, I do have an observation $y_0$. It doesn't strike me as unusual. The difference is small anyway, waiting a while to make the first measurement just changes the expected initial variance. – Daniel Aug 27 '19 at 10:33

1 Answers1

0

The posted attempt is almost correct, but it incorrectly identifies the covariance of the joint normal distribution.

Joint Normal Distribution

Given $$P(\vec y|\vec x) = \prod_{i=0}^{N-1}\frac{1}{\sqrt{2\pi\sigma_z^2}}\exp\left(-\frac{(y_i-x_i)^2}{2\sigma_z^2}\right).$$ and $$ P(\vec x) = \frac{1}{\sqrt{2\pi\sigma_{x_0}^2}}\exp\left(-\frac{x_0^2}{2\sigma_{x_0}^2}\right)\times\prod_{i=1}^{N-1}\frac{1}{\sqrt{2\pi\sigma_d^2}}\exp\left(-\frac{(x_i-x_{i-1})^2}{2\sigma_d^2}\right), $$ we can write $$ P(\vec x,\vec y)=P(\vec x)P(\vec y|\vec x)\propto \exp\left(-\frac12 (\vec x^\top,\vec y^\top)\mathsf Q\begin{pmatrix}\vec x\\\vec y\end{pmatrix}\right), $$ where $\mathsf Q$ has the entries that I thought above where the covariance matrix, i.e., $\mathsf Q_{yy} = \mathsf Q_{xy} = (1/2\sigma_z^2)\mathbb 1$, and $$ [\mathsf Q_{xx}]_{ij} = \left( \frac{1}{2\sigma_z^2} + \frac{1}{\sigma_d^2} \right)\delta_{i,j} + \left( \frac{1}{2\sigma_{x_0}^2} - \frac{1}{2\sigma_d^2} \right)\delta_{i,0}\delta_{j,0}+\frac{1}{2\sigma_d^2}(\delta_{i,j+1}+\delta_{i,j-1}).$$

Covariance Matrix

In contrast to what is stated above, the covariance matrix $\mathsf\Sigma=\mathsf Q^{-1}$. This is easily done with block matrix inversion. We find $$ \mathsf\Sigma=\begin{pmatrix}\mathsf\Sigma_{xx} & -\mathsf\Sigma_{xx} \\ -\mathsf\Sigma_{xx} & 2\sigma_z^2+\mathsf\Sigma_{xx}\end{pmatrix},$$ where $$ \mathsf\Sigma_{xx}=(\mathsf Q_{xx}-1/(2\sigma_z^2))^{-1}. $$

Conditional distribution

Given this, the conditional distribution for $\vec x$ can be derived from the joint distribution [2] $$ \vec x \sim N(\mu_x+\mathsf\Sigma_{xy}\mathsf\Sigma_{yy}^{-1}(\vec y-\vec\mu_y), \mathsf\Sigma_{xx}-\mathsf\Sigma_{xy}\mathsf\Sigma_{yy}^{-1}\mathsf\Sigma_{xy}). $$

Analytical solution

I think the matrix $\mathsf Q_{xx}$ can be inverted analytically, or at least to a very good approximation. I will include this once I have done so.

Daniel
  • 51
  • 4