Statsmodel logit producing param nans, large std err, warnings, but model performance is fine

Question

I keep getting warnings such as

RuntimeWarning: invalid value encountered in greaterreturn (a < x) & (x < b)

and my model summary is full of nans and very large standard errors. The model performance is near identical with what I get when I train with sklearn so it works fine for predictions. But why am I seeing so many weird numbers? I've seen answers about perfect separation causing similar issues - but that is not the case here? I've seen with real data but I get the same issues with generated data as well.

Code to reproduce

import statsmodels.api as sm
import pandas as pd
from sklearn import datasets
from numpy import random


data = datasets.make_classification(n_features = 70, n_informative = 50, n_redundant = 20,n_samples= 10000, random_state = 3)
X = pd.DataFrame(data[0] )
y = data[1]

X['rand_feat1'] = random.randint(100, size=(X.shape[0]))
X['rand_feat2'] = random.randint(100, size=(X.shape[0]))/100

logit_model=sm.Logit(y, X)
sm_result=logit_model.fit_regularized(maxiter = 10000)

print(sm_result.summary())

Output:

The answer in the link for closing, namely perfect separation, is not an answer to this question. The question is related to multicollinearity problems and penalized estimation. `Logit.fit_regularized` adds nans in standard errors also intentionally for L1 penalized parameters close to zero because standard inference doesn't apply.. — Josef, Aug 15 '20 at 17:33
@Josef So is the basic answer that the params with the nan's are those zero-ed out by L1, and can be dropped or set to zero? If so, why aren't the coefficients also set to nan (or zero?). Is there any information in them? — HoosierDaddy, Oct 10 '21 at 15:44
Found this link that seems to indicate the nans are for parameters that are zerod. Still not sure why they are not actually set to zero... https://github.com/statsmodels/statsmodels/blob/e741f3b22302199121090822353f20d794a02148/statsmodels/discrete/discrete_model.py#L401 — HoosierDaddy, Oct 10 '21 at 15:50
I think in this case the problem is multicollinearity and a hessian that is not positive (semi)definite with some negative diagonal elements of cov_params. In that case the nans come from the sqrt in the standard errors `bse`. I think you are right that if the nans in this case were intentional, then the params should have been set to zero. — Josef, Oct 11 '21 at 00:46
Default `alpha=0`. So, actually there is no penalization in the above example, AFAICS. — Josef, Oct 11 '21 at 00:57

score 1 · Answer 1 · answered Aug 14 '20 at 23:47

A typical example of (near) singular feature matrix. Some of your features are (near) duplicates of one another and they blow up the $(X'X)^{-1}$ matrix. Fortunately, some implementations of regression have their own way to dealing with it and you can see some result. So the coefficients up there wouldn't have much meaning. e.g. if you have identical variable $X_1$ and $X_2$, different values of $\beta_1$ and $\beta_2$ would result in the same prediction as long as $\beta_1+\beta_2$ is unchanged. That means you can still can enjoy the predictive performance of model, although your coefficients are messed up.

I recommend using LASSO or ridge to regularize to get 'interpretable' coefficients.

Statsmodel logit producing param nans, large std err, warnings, but model performance is fine

1 Answers1