Questions tagged [condition-number]

The condition number (also known as condition index) is a diagnostic tool for collinearity in regression models. A regression model has as many condition numbers as it has independent variables. Each is defined as the square root of the ratio of the largest eigenvalue to the eigenvalue for the respective variable. Condition numbers over 30 are considered to be signs of problematic collinearity.

28 questions
13
votes
1 answer

Cause of a high condition number in a python statsmodels regression?

I'm pretty new to regression analysis, and I'm using python's statsmodels to look at the relationship between GDP/health/social services spending and health outcomes (DALYs) across the OECD. Just to give an idea of the data I'm using, this is a…
pst0102
  • 131
  • 1
  • 1
  • 5
9
votes
2 answers

How do you interpret the condition number of a correlation matrix

I have two correlation matrices, one with a condition number of 9 and the other with a condition number of 70. From what i have read, it will appear that the first matrix is better conditioned than the other based on these figures alone, but i am…
Jaja
  • 93
  • 1
  • 1
  • 5
6
votes
0 answers

Explaining conditioning number in statistics to non-statisticians

I work these days as a statistician and a lot of what I do is evaluating design of experiments; I started this job less than a year ago, after getting a PhD in mathematical statistics. I remember once trying to explain conditioning numbers and their…
5
votes
0 answers

Choosing the basis functions in a linear regression

I have two random variables $X$ and $Y$ and I'm trying to model $\mathbb{E}[Y|X]$. To this end, I'd like to pick a collection of functions $f_1, f_2 \dots f_n : \mathbb{R} \to \mathbb{R}$ and then fit a model to my data set $(x_i, y_i)_{1 \dots m}$…
user357269
  • 351
  • 1
  • 10
5
votes
2 answers

When is it appropriate to override the default reciprocal condition number tolerance for solve() in R?

I am estimating a GMM IV model, where I'm creating a weighting matrix by taking the inverse of Z'Z, where Z is a matrix of instruments. For certain combinations of instruments, when I try to compute this in R with the solve() function, I get an…
rsandler
  • 151
  • 6
5
votes
1 answer

Automatically fixing ill-conditioning or collinearity

I'm backtesting a regression model, which entails running it on a bunch of bootstrap samples of a "rewound" version of our data set. Unfortunately, in some of these resamplings, I end up getting some "coincidental" dependencies between covariates…
4
votes
1 answer

Why does matrix condition number change drastically when a constant is added?

If I create a regression model design matrix with 3 uncorrelated variables, I get a small condition number as expected. MWE: > import numpy as np, pandas as pd > n = 1000 > X = pd.DataFrame() > X['x1'] = np.random.normal(size=n) * 500 > X['x2'] =…
4
votes
0 answers

Proximal Gradient Descent and Proximal Coordinate descent for Lasso Problem

Why is proximal coordinate descent much less affected by bad conditioning than proximal gradient descent? For example, we can consider this problem : $\min_x \frac{1}{2}\|Ax-b\|^2_2 + \lambda\|x\|_1$ If A has a large condition number, how can we…
aferjani
  • 66
  • 2
3
votes
0 answers

Does the correlation matrix always have a smaller condition number than the covariance matrix?

In my experience and the experience of others (for example: https://stats.stackexchange.com/a/287737/193216), a covariance matrix $\textrm{COV}=\langle {\bf x}_t{\bf x}_t'\rangle$ has a larger condition number $\kappa$ than the corresponding…
3
votes
1 answer

Why does the condition number of the covariance matrix explode as number of variables increases?

From asset returns of $N$ stocks, the symmetric covariance matrix sized $N\times N$ is constructed, which treats the asset returns as variables. When the number of variables $N$ is fairly low like $N=5$ or $N=12$, the condition number is relatively…
3
votes
1 answer

How are the condition numbers of a design matrix and its correlation matrix related?

Given a design matrix $X$ for a linear regression model, what is the relationship between the condition number of $X$ and its correlation matrix $R$? I would be interested in the case of a centered standardized $X.$
lalessandro
  • 229
  • 1
  • 9
3
votes
2 answers

Deep Learning: Condition Number and Poor Conditioning

I am reading the following section of the book Deep Learning. Can you provide an intuitive explanation of the above section? I don't quite understand the statement "When this number is large, matrix inversion is particularly sensitive to error in…
2
votes
2 answers

Multicolinearity and Condition number of logistic regresison

It seems to be common to take a "high" condition number as a sign for multicolinearity in regression analysis. For linear models I'm totally convinced that this is a good idea, but is there any analysis on how colinearity influences the condition…
Jonasson
  • 525
  • 6
  • 11
2
votes
0 answers

Should the intercept be included when you check the condition index?

Many sources state that a condition index >30 constitutes a multicollinearity problem. When I've tried to implement this check in practice, I've realized that the condition index (and VIFs) change depending on whether or not the intercept is…
2
votes
2 answers

Condition number of data matrix and stability of OLS estimates

I have a multivariate regression model $Y=X\beta ' + \epsilon$. The variables in the $X$ matrix have very different scales and hence the condition number of $X'X$ is huge (order of trillions). I would like to know if there are problems with…
DatamineR
  • 741
  • 3
  • 6
1
2