MLE derivation of the Recursive Least Squares estimator

Question

I think I'm able to derive the RLS estimate using simple properties of the likelihood/score function, assuming standard normal errors. If the model is $$Y_t = X_t\beta + W_t$$

then the likelihood function (at time $N$) is $$L_N(\beta_{N}) = \frac{1}{2}\sum_{t=1}^N(y_t - x_t^T\beta_N)^2$$

Note that I'm denoting $\beta_N$ the MLE estimate at time $N$.

The score function (i.e.$L'(\beta)$) is then $$S_N(\beta_N) = -\sum_{t=1}^N[x_t^T(x_t^Ty_t-x_t\beta_N )] = S_{N-1}(\beta_N) -x_N^T(y_N-x_N\beta_N ) = 0$$

If we do a first-order Taylor Expansion of $S_N(\beta_N)$ around last-period's MLE estimate (i.e. $\beta_{N-1}$), we see:

$$S_N(\beta_N) = S_N(\beta_{N-1}) + S_N'(\beta_{N-1})(\beta_{N} - \beta_{N-1})$$ But $S_N(\beta_N)$ = 0, since $\beta_N$ is the MLE esetimate at time $N$. Therefore, rearranging we get:

$$\beta_{N} = \beta_{N-1} - [S_N'(\beta_{N-1})]^{-1}S_N(\beta_{N-1})$$

Now, plugging in $\beta_{N-1}$ into the score function above gives $$S_N(\beta_{N-1}) = S_{N-1}(\beta_{N-1}) -x_N^T(x_N^Ty_N-x_N\beta_{N-1}) = -x_N^T(y_N-x_N\beta_{N-1})$$

Because $S_{N-1}(\beta_{N-1})= 0 = S_{N}(\beta_{N})$

Which leaves us with:

$$\beta_{N} = \beta_{N-1} + K_N x_N^T(y_N-x_N\beta_{N-1})$$

and $K_N = [\sum_{t=1}^Nx_t^Tx_t]^{-1}$

QUESTIONS:

Did I do anything wrong above? I was a bit surprised about it, and I haven't seen this derivation elsewhere yet.
Is it possible to extend this derivation to a more generic Kalman Filter? I've tried, but I'm too new to the concept.

Can you explain how/if this is any different than the Newton Raphson method to finding the root of the Score function? — AdamO, May 09 '18 at 20:56
It's definitely similar, of course, in the sense that Newton Raphson uses a Taylor Expansion method to find a solution. Like the Kalman Filter, we're not only interesting in uncovering the exact $\beta$, but also seeing how our estimate evolves over time and (more importantly), what our "best guess" for next periods value of $\hat{\beta}$ will be given our current estimate and the most recent data innovation. I also found this derivation of the the RLS estimate (last equation) a lot more simple than others. Just a Taylor expansion of the score function. — measure_theory, May 09 '18 at 21:04

score 1 · Answer 1 · answered May 09 '18 at 21:15

Two things:
1) You ignore the Taylor remainder, so you have to say something about it (since you are indeed taking a Taylor expansion and not using the mean value theorem)

2) You make a very specific distributional assumption so that the log-likelihood function becomes nothing else than the sum of squared errors. Calling it "the likelihood function", then "the score function", does not add anything here, does not bring any distinct contribution from maximum likelihood theory into the derivation, since by taking the first derivative of the function and setting it equal to zero you do exactly what you would do in order to minimize the sum of squared errors also.

Here is a CV thread where RLS and Kalman filter appear together.

Assuming normal standard errors is pretty standard, right? Its also typically assumed when introducing RLS and Kalman filters (at least what Ive seen). I did it for illustrative purposes because the log-likelihood is quadratic and the Taylor expansion is exact. I also did use features of the likelihood function e.g $S_{N}(\beta_N) = 0$, and arrived at the same result, which I thought was pretty neat. Assuming normal errors also means the estimate of $\beta$ achieves he cramer_rao lower bound, i.e this recursive estimate of $\beta$ is the best we can do given the data/assumptions — measure_theory, May 09 '18 at 21:23

MLE derivation of the Recursive Least Squares estimator

1 Answers1