I'm working on going through a paper and I'm having a bit of an issue understanding some of the literature. With the following given background here's what the literature says,
I am able to understand how (1), (2) and (3) are related as well as how to get (3) using (1) and (2). However I am lacking some knowledge on how (4) is obtained as the penalized least squares. I know that under a general spline model we can get fit an ordinary least-squares fit with $\hat{y}=X \hat{\beta}$ where $\hat{\beta}$ minimizes $|| y-X \hat{\beta}||^2$ and $\beta = [\beta_0,\beta_1,\beta_{1,1},\beta_{1K}]^T$ with $\beta_{1k}$ the coefficient of the kth knot. Which can then be written using a Lagrange multiplier as, $$|| y-X \hat{\beta}||^2 + \lambda^2 \beta^T D \beta$$ for some number $\lambda \ge 0$ and the term $\lambda^2 \beta^T D \beta$ is the roughness penalty with $\lambda$ being the smoothing parameter.
So for the least-squares that I have mentioned above, how can I change that to appear as equation (4) as I understand where most of the parameters come from such as $u, D, \lambda, \{Y-B(t) \theta \}$ however I am unsure of how $\frac{1}{2}, R^{-1}_{\epsilon}$ are both obtained. If anyone could provide some more insight it would be really useful.
Thanks