I have been trying to implement a logistic regression model in R (using mnlogit) for 12 predictor variables x1...x12 to predict a binary outcome y.
There are three variables (call them x1,x2,x3, the order doesn't matter) that consistently give NA values in the regression model whenever they co-occur with one another (regardless of which of the remaining 9 are included or excluded). I had assumed that this was because of (near) co-linearity among these variables. To test this, for data matrix AF, I ran
cor(AF, use="pairwise.complete")
While the cor(x1,x2) = 0.60, the cor(x1,x3) = 0.10, cor(x2,x3) = 0.2, which are smaller correlation than that observed for each variable with at least some of the remaining 9 predictor variables. For instance, cor(x1,x10) = 0.35, larger than the correlation between x1,x3 or x2,x3. However, a model with x1 and x10 together (as long as x2, x3 are excluded) returns an estimated regression coefficient rather than NA. The same applies for other pairings.
Could something other than correlation among predictor variables be responsible for the NAs in the regression model? In previous attempts to estimate this logistic regression model, I encountered as similar problem with "sparse" variables (i.e. where nearly all individual measurements are 0), but I have since removed these as well.
Addendum: I have included an example of output. Int_0, Int_1, and Out_Spr correspond to the 3 incompatible variables x1,x2,x3, note that the first 2 give NAs. This example has 19 rather than 12 predictor variables, but the results are qualitatively the same:
Call:
mnlogit(formula = fm, data = All_Final_last_long, ncores = 8,
reflevel = "0")
Frequencies of alternatives in input data:
0 1
0.97935 0.02065
Number of observations in data = 18305
Number of alternatives = 2
Intercept turned: OFF
Number of parameters in model = 19
# individual specific variables = 19
# choice specific coeff variables = 0
# individual independent variables = 0
-------------------------------------------------------------
Maximum likelihood estimation using the Newton-Raphson method
-------------------------------------------------------------
Number of iterations: 17
Number of linesearch iterations: 17
At termination:
Gradient norm = 0.00104
Diff between last 2 loglik values = 6.8e-07
Stopping reason: Succesive loglik difference < ftol (1e-06).
Total estimation time (sec): 0.3
Time for Hessian calculations (sec): 0.22 using 8 processors.
Coefficients :
Estimate Std.Error t-value Pr(>|t|)
TTo_last:1 -9.3977e+00 3.6825e+00 -2.5520 0.01071 *
TTo_last_sq:1 3.1922e+01 2.4143e+01 1.3222 0.18611
TTo_last_cub:1 -4.2153e+01 4.3544e+01 -0.9681 0.33302
Out_Sum:1 -8.1411e+00 3.5207e+00 -2.3124 0.02076 *
Out_Win:1 -6.4081e+00 3.5181e+00 -1.8215 0.06853 .
Out_Spr:1 -7.3456e+00 3.5226e+00 -2.0853 0.03704 *
TM_0:1 1.9803e+00 4.7466e+00 0.4172 0.67652
TM_1:1 -2.3025e-01 2.0428e+00 -0.1127 0.91026
Int_2:1 7.2317e-01 1.7757e-01 4.0726 4.649e-05 ***
TM_2:1 -8.7704e-01 3.9441e+00 -0.2224 0.82403
Int_3:1 5.7395e-01 2.3668e-01 2.4250 0.01531 *
TM_3:1 -7.3635e+00 1.3324e+01 -0.5526 0.58052
Int_4:1 4.5607e-01 3.9432e-01 1.1566 0.24743
TM_4:1 4.3723e+00 6.4967e+04 0.0001 0.99995
Int_5:1 4.7611e-01 8.2288e-01 0.5786 0.56287
TM_5:1 8.0781e+00 1.8213e+05 0.0000 0.99996
Int_6:1 -1.3777e+01 3.0117e+03 -0.0046 0.99635
Int_7:1 -1.3967e-01 8.6804e+03 0.0000 0.99999
TS_9_1:1 5.5834e+00 4.6865e+00 1.1914 0.23350
Int_0:1 NA NA NA NA
Int_1:1 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -1741.8, df = 19
AIC: 3521.5