Multiple linear regression with normalization - how to get non-scaled full covariance matrix?

Question

I am doing a quite complicated multiple regression modelling in physics and have a problem how to got back to covariance matrix for non-normalized parameters. I don't know how to calculate the error for the intercept and get the full covariance matrix (that is a matrix that will include a row and column also for intercept). I have a model: $$ d=Gm $$ where $d$ is a vector of my experimental data, $G$ a design matrix and $m$ a vector of model parameters. I need to use a TSVD for regularization (problem is ill-posed) and hence need a normalization. This is done by: $$ d_s=(d-\bar{d})/\sigma(d) $$ $$ Z[i]=(G[i]-\bar{G[i]})/\sigma(G[i]) $$ where G[i] is a i-th column of G. As I removed the mean from $d$ I don't need intercept in scaled model. Therefore the solution of my problem is: $$ m_s=(Z^TZ)^{-1}Z^Td_s $$ where the vector $m_s$ do not include intercept. I can now calculate the covariance matrix for $m_s$ and got back to $m$.

I know how to calculate the intercept. This I do with: $$ m_0=\bar{d}-\sum\bar{G[i]}m_s[i] $$ however I have no idea how to get the error for it and calculate the full covariance matrix for $m$. I need this as I need to see it in true physical units.

Try to find it by comparing with the results of regression without normalization but couldn't guess anything. Also hard to find in regression books.

Any help will be very appreciated!

I don't see why you need to standardize in this way / how these linear transformations will make an ill-posed problem better suited to standard OLS methods (see: [When should you center your data & when should you standardize?](http://stats.stackexchange.com/q/29781/7290)). — gung - Reinstate Monica, Jan 23 '15 at 18:33

Pedro Pesce · Answer 1 · 2015-01-26T13:01:57.427

An approach that might work would be to write the physical coefficients in terms of the scaled coefficients:

$m = Am_s$, for some matrix $A$. Then:

$\textrm{cov}(m) = A\textrm{cov}(m_s)A^T$

Please notice that this approach considers $A$ to be known exactly.

A different approach, which is at most a quick and dirty way to estimate the covariance would be to add a column of ones to your matrix $G$ and, when you create the scaled matrix $Z$, change that column of ones to a really large number, such that when you use your TSVD it will, mostly, be left unaffected by the truncation. You will, however, most likely have to change your truncation parameter for th singular value.

Multiple linear regression with normalization - how to get non-scaled full covariance matrix?

1 Answers1