2

I have a multivariate regression model $Y=X\beta ' + \epsilon$. The variables in the $X$ matrix have very different scales and hence the condition number of $X'X$ is huge (order of trillions).

I would like to know if there are problems with parameter estimation due to the high condition number. On one hand, I suspect that if the number is high, the estimates of the $\beta$ are very unstable (because a small change in $X$ could have a large impact on the solution of $X'X\hat{\beta}=X'Y$). On the other hand, I do not think the stability of the solution shall change if I just change the units of the data matrix $X$, because the new estimates should just be multiples of the previous estimates.

Could someone provide advice?

Thanks.

DatamineR
  • 741
  • 3
  • 6
  • 1
    Such a condition number is meaningless, because the first thing any decent software will do is standardize the columns of $X$. What is the condition number of the matrix your solver is actually working with? – whuber Oct 21 '14 at 06:33

2 Answers2

2

what you are looking for is ridge regression- basically it adds a regularisation term $\alpha \|w\|^2 $, where $w$ is the coefficient vector, to the mean squared error (see eg wiki Tikhonov regularisation). This term penalises large weights (and so rescaling now has some benefit): in particular it penalises solutions with large opposing positive and negative weights

seanv507
  • 4,305
  • 16
  • 25
  • Thank you. So do really different scales of the variables cause (relative) instability of the estimates? – DatamineR Oct 20 '14 at 23:57
  • The individual scales of the variables are not the source of the large condition number. The condition number is the ratio between the largest and smallest eigenvalues of $X^T X$, which is to do with the correlations between the variables. Your matrix is likely rank deficient, perhaps the smallest eigenvalues are practically zero. – purple51 Oct 21 '14 at 00:04
  • @purple51 - consider columns of X being independent -what are your eigenvalues? – seanv507 Oct 21 '14 at 00:07
  • hmm - basically you often want to enforce a "physical" intuition - small changes to the input should have small changes to the output. Using the above scalar $alpha$ regularisation, ideally you would rescale your variables so that a unit change in each variable has a roughly similar sized response- then alpha basically is a cutoff between true signal and (small) measurement noise etc. – seanv507 Oct 21 '14 at 00:35
  • Yes, you're right, but in I would suspect that in many cases the scaling is less of a problem than multicollinearity if you're doing double precision computation, unless the differences in scale are truly astronomical. But agree about scaling and using some regularization like ridge regression. – purple51 Oct 21 '14 at 02:06
2

It's a simple fact and easily verified that multiplying one column of X by a scaling factor of say 1,000,000 can dramatically change the condition number of X. Your intuition about the effect of scaling is wrong.

Brian Borchers
  • 5,015
  • 1
  • 18
  • 27