I think I'm able to derive the RLS estimate using simple properties of the likelihood/score function, assuming standard normal errors. If the model is $$Y_t = X_t\beta + W_t$$
then the likelihood function (at time $N$) is $$L_N(\beta_{N}) = \frac{1}{2}\sum_{t=1}^N(y_t - x_t^T\beta_N)^2$$
Note that I'm denoting $\beta_N$ the MLE estimate at time $N$.
The score function (i.e.$L'(\beta)$) is then $$S_N(\beta_N) = -\sum_{t=1}^N[x_t^T(x_t^Ty_t-x_t\beta_N )] = S_{N-1}(\beta_N) -x_N^T(y_N-x_N\beta_N ) = 0$$
If we do a first-order Taylor Expansion of $S_N(\beta_N)$ around last-period's MLE estimate (i.e. $\beta_{N-1}$), we see:
$$S_N(\beta_N) = S_N(\beta_{N-1}) + S_N'(\beta_{N-1})(\beta_{N} - \beta_{N-1})$$ But $S_N(\beta_N)$ = 0, since $\beta_N$ is the MLE esetimate at time $N$. Therefore, rearranging we get:
$$\beta_{N} = \beta_{N-1} - [S_N'(\beta_{N-1})]^{-1}S_N(\beta_{N-1})$$
Now, plugging in $\beta_{N-1}$ into the score function above gives $$S_N(\beta_{N-1}) = S_{N-1}(\beta_{N-1}) -x_N^T(x_N^Ty_N-x_N\beta_{N-1}) = -x_N^T(y_N-x_N\beta_{N-1})$$
Because $S_{N-1}(\beta_{N-1})= 0 = S_{N}(\beta_{N})$
Which leaves us with:
$$\beta_{N} = \beta_{N-1} + K_N x_N^T(y_N-x_N\beta_{N-1})$$
and $K_N = [\sum_{t=1}^Nx_t^Tx_t]^{-1}$
QUESTIONS:
Did I do anything wrong above? I was a bit surprised about it, and I haven't seen this derivation elsewhere yet.
Is it possible to extend this derivation to a more generic Kalman Filter? I've tried, but I'm too new to the concept.