2

I've got some ordinal variables b and a and a categorized variable c. I would like to fit a multinomial logit regression from the library car. I tried to ignore the ordinal scale. I have the following data:

 a<-c( 3, 4,   4,   4,   3, 4,   3, 3, 4,   2, 2, 4,   3, 3, 3, 1,   3, 2, 2, 3, 3, 1,   3, 2, 2, 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 3, 2, 2, 3, 1,   2, 2, 2, 2, 3, 2, 3, 4,   4,   3, 3, 2, 2, 3, 3, 3, 2, 1,   1,   1,   1,   1,   1,   2, 3, 4,   3, 3, 4,   3, 4,   3, 2, 3, 3, 3, 3, 3, 4,   3, 4,   3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 3, 2, 3, 2, 3, 1,   2, 2, 1,   1,   4,   3, 3, 2, 2, 2, 2, 2, 2, 3, 4,   4,   4,   3, 3, 3, 3, 3, 4,   4,   3, 3, 2, 3, 3, 3, 3, 4,   3, 4,   2, 2, 3, 3, 3, 2, 2, 3, 2, 4,   2, 2, 2, 2, 2, 1,   2, 2, 1,   1,   3, 4,   3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 2, 1)


 c<-c(5 ,3 ,4 ,3 ,4 ,4 ,2 ,2 ,3 ,4 ,2 ,5, 3, 5, 4 ,3 ,2 ,4 ,4 ,4, 4 ,4, 4, 2 ,3, 4 ,2 ,3 ,3 ,3 ,4 ,3 ,3 ,2 ,2 ,3 ,3 ,3 ,3 ,4 ,2 ,4 ,3, 3, 3, 4, 4, 3, 3 ,2 ,3 ,3 ,3 ,3, 4 ,4, 4, 3, 2, 2 ,4 ,4 ,3 ,3 ,2 ,2 ,1 ,2 ,2 ,2 ,1 ,2 ,5 ,2 ,3 ,3 ,2, 4 ,3 ,1 ,2 ,3 ,2 ,3 ,3 ,3 ,3 ,3 ,3 ,2 ,2 ,2 ,2 ,3 ,2 ,4 ,3, 3 ,2 ,3, 2, 4, 3, 3, 3 ,3 ,4 ,2 ,2 ,4 ,3 ,3 ,3 ,3 ,3 ,2 ,3, 3 ,3, 3, 4 ,4 ,4 ,1 ,3 ,3 ,3 ,4 ,4 ,4 ,3 ,2 ,4 ,4 ,2 ,4 ,4 ,4 ,4 ,2 ,3 ,3, 2, 2 ,3 ,2 ,3 ,4 ,5 ,2, 3 ,3 ,2 ,3 ,2 ,2 ,3 ,2 ,2 ,4 ,4 ,3 ,3 ,2, 4 ,4 ,2 ,4 ,3 ,4, 4, 3 ,2 ,3)
b<-c(3 ,2, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 4, 2, 3, 1, 3, 1, 2, 4, 2, 1, 3, 2, 2, 2, 1, 3, 3, 3, 2, 2, 2, 2, 1, 3, 1, 3, 2, 3, 1 ,3 ,3 ,2, 2, 3, 1, 3, 2, 2, 2, 2, 2, 2, 3, 4, 3, 3, 2, 1, 4, 3 ,3 ,2 ,2, 1, 2, 2, 2, 2, 1, 2, 5, 3, 3, 4, 3, 4, 1, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 2, 2, 3, 2 ,4 ,2 ,3 ,2 ,2 ,2 ,4 ,2 ,2 ,2 ,2 ,2 ,2, 1, 5 ,4 ,3 ,2, 2 ,2 ,2 ,2 ,4 ,2 ,2 ,4 ,3 ,3 ,1 ,2 ,2 ,2 ,2 ,2 ,2, 2, 2, 2, 2, 2 ,2 ,2 ,2 ,2 ,2 ,2, 2, 2, 2 ,2 ,2 ,2 ,2 ,5 ,4 ,3 ,2 ,1 ,1 ,1 ,4 ,3 ,2 ,2 ,3 ,3 ,3 ,2 ,2 ,2 ,2 ,2, 3, 2 ,2 ,2 ,2 ,2 ,1)

now I ignored the ordinal scale and treated them as factors to fit the multinomial logit regression

require(car)
a<-as.factor(a)    
b<-as.factor(b)
c<-as.factor(c)
multinom(formula = a ~ b + c)

Call:
multinom(formula = a ~ b + c)

Coefficients:
  (Intercept)        b2       b3       b4        b5         c2         c3         c4        c5
2   0.3410779  1.009797 41.80056 45.22081 -13.02923 -0.5229982  0.9216514  0.2170273 -18.03928
3  -1.4697131  2.698228 44.91938 47.04268 -16.24570 -0.7341395  0.7088424  1.2495310  20.70641
4 -46.0095393 33.603384 75.13911 79.00502  56.91264 -7.4198320 13.0220759 14.2526951  33.85774

Std. Errors:
  (Intercept)        b2        b3        b4           b5           c2        c3        c4           c5
2   1.2654428 0.6530052 0.4659520 0.5495402          NaN 1.337075e+00 1.4180126 1.4993079 8.028986e-16
3   1.6649206 0.9361438 0.5123106 0.5879588 2.446562e-15 1.640462e+00 1.7003411 1.7558418 8.601766e-01
4   0.3399454 0.4767032 0.3699569 0.4144527 3.321501e-11 6.973173e-08 0.6549144 0.6953767 8.601766e-01

Residual Deviance: 328.1614
AIC: 382.1614  

I think I found the mistake....the column b5 is empty for a1 and a2.

table(b,c,a)
, , a = 1

   c
b    1  2  3  4  5
  1  0  3  2  2  0
  2  1  7  1  0  0
  3  0  0  0  0  0
  4  0  0  0  0  0
  5  0  0  0  0  0

, , a = 2

   c
b    1  2  3  4  5
  1  1  5  2  2  0
  2  1 12 21  4  0
  3  0  1  6  1  0
  4  0  2  1  1  0
  5  0  0  0  0  0

But do you know how to solve this problem?

user2685139
  • 195
  • 4
  • 11

1 Answers1

2

Ordinal independent variables are always tricky. As far as I know, there are essentially four sorts of ways of dealing with them, the first two are within the multinomial logistic scheme: Treat them as categorical and ignore the ordinality or treat them as continuous and pretend they are interval. One way of ameliorating some of the concerns of the latter pretense is to choose several different ordinal schemes and see if it makes a substantial difference. The latter two are row column effects association models (see Agresti, p. 154-184) or nonmodel based methods (see Agresti, p 184-224).

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • thank you for your answer.I added some additional information to my question, because I tried fitting the model with a lower scale. Please have a look at it – user2685139 Sep 19 '13 at 14:50
  • Looks like there is some problem with b5 – Peter Flom Sep 19 '13 at 14:55
  • yes that's right...But what kind of problem? is it because there there are only three observations for b5? I deleted the observations and now it works....do you think this is the solution? or might there be a solution without deleting ?I fitted a completely different multinomial logit regression and there is also this problem (but with d4, and d4 has 12 observations....). So I think deliting is not an option... – user2685139 Sep 19 '13 at 15:13
  • Do you know which kind of problem could be in my data? Or is this test not valid? Please help me, I do not know how to continue.... – user2685139 Sep 20 '13 at 09:04
  • 1
    You have a problem of quasi-complete separation. For ways to continue see [Allison](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.176.2077&rep=rep1&type=pdf) – Peter Flom Sep 20 '13 at 10:07
  • that's fantastic! Thank you very much. Do you know if it is possible to fit a multinomial logit regression with the penalized maximum likelihood estimation? I found this for binary models.... – user2685139 Sep 20 '13 at 15:15
  • I don't know the answer to that – Peter Flom Sep 20 '13 at 16:57
  • @user2685139, it is possible, in principle, but if you're asking whether there is a pre-existing software package, or an argument you can pass to `multinom` to make that happen, I don't think so. You'll probably need to "build you own" for this. The multinomial log-likelihood is straightforward to calculate and you can subtract off your selected penalty (e.g. Lasso: $\lambda \sum_j |\beta_j|$ or ridge $\lambda \sum_j \beta_j^{2}$) and use your favorite optimizer to maximize the penalized likelihood. – Macro Sep 20 '13 at 17:33