Condition number of data matrix and stability of OLS estimates

Question

I have a multivariate regression model $Y=X\beta ' + \epsilon$. The variables in the $X$ matrix have very different scales and hence the condition number of $X'X$ is huge (order of trillions).

I would like to know if there are problems with parameter estimation due to the high condition number. On one hand, I suspect that if the number is high, the estimates of the $\beta$ are very unstable (because a small change in $X$ could have a large impact on the solution of $X'X\hat{\beta}=X'Y$). On the other hand, I do not think the stability of the solution shall change if I just change the units of the data matrix $X$, because the new estimates should just be multiples of the previous estimates.

Could someone provide advice?

Thanks.

Such a condition number is meaningless, because the first thing any decent software will do is standardize the columns of $X$. What is the condition number of the matrix your solver is actually working with? — whuber, Oct 21 '14 at 06:33

score 2 · Answer 1 · answered Oct 20 '14 at 23:28

2

what you are looking for is ridge regression- basically it adds a regularisation term $\alpha \|w\|^2 $, where $w$ is the coefficient vector, to the mean squared error (see eg wiki Tikhonov regularisation). This term penalises large weights (and so rescaling now has some benefit): in particular it penalises solutions with large opposing positive and negative weights

answered Oct 20 '14 at 23:28

seanv507

4,305
16
25

Thank you. So do really different scales of the variables cause (relative) instability of the estimates? – DatamineR Oct 20 '14 at 23:57
The individual scales of the variables are not the source of the large condition number. The condition number is the ratio between the largest and smallest eigenvalues of $X^T X$, which is to do with the correlations between the variables. Your matrix is likely rank deficient, perhaps the smallest eigenvalues are practically zero. – purple51 Oct 21 '14 at 00:04
@purple51 - consider columns of X being independent -what are your eigenvalues? – seanv507 Oct 21 '14 at 00:07
hmm - basically you often want to enforce a "physical" intuition - small changes to the input should have small changes to the output. Using the above scalar $alpha$ regularisation, ideally you would rescale your variables so that a unit change in each variable has a roughly similar sized response- then alpha basically is a cutoff between true signal and (small) measurement noise etc. – seanv507 Oct 21 '14 at 00:35
Yes, you're right, but in I would suspect that in many cases the scaling is less of a problem than multicollinearity if you're doing double precision computation, unless the differences in scale are truly astronomical. But agree about scaling and using some regularization like ridge regression. – purple51 Oct 21 '14 at 02:06

score 2 · Answer 2 · answered Oct 21 '14 at 02:01

2

It's a simple fact and easily verified that multiplying one column of X by a scaling factor of say 1,000,000 can dramatically change the condition number of X. Your intuition about the effect of scaling is wrong.

answered Oct 21 '14 at 02:01

Brian Borchers

5,015
1
18
27

Condition number of data matrix and stability of OLS estimates

2 Answers2

Linked