My goal is to predict y, but my dependent variable y
has more than 20 levels
. I dont think multi-nomial model would be a good choice ? Any suggestions or pointers on what modeling methodology I should explore for this problem is much appreciated. Thanks in advance.
Asked
Active
Viewed 77 times
3

bison2178
- 457
- 3
- 13
-
1[Ordinal logistic](https://en.wikipedia.org/wiki/Ordered_logit) models are often used when $y$ is ordered - proportional odds being the most common assumption. But why don't you think a multinomial model would be a good choice? – Scortchi - Reinstate Monica Jun 03 '16 at 18:13
-
@Scortchi, data is not Ordinal. It just seems odd running a multinomial model on a `y` that has so many levels. This maybe due to my inexperience on this issue. – bison2178 Jun 03 '16 at 18:51
-
1Just wondered. I can't think of anything specifically contra-indicating multinomial regression in this case, though of course you'll have a lot of coefficients to estimate & over-fitting'll become a problem sooner than with fewer levels. If you're classifying rather than just predicting the probability of class membership, then think about @hxd1011's point. Is each of the more than 380 possible misclassifications equally bad? – Scortchi - Reinstate Monica Jun 05 '16 at 09:04
-
@Scortchi Thanks Scortchi now I am confident about my instincts on this topic – bison2178 Jun 09 '16 at 04:29
1 Answers
1
Predicting a discrete outcome with too many levels is a hard problem. Usually people do one vs. others approach, where you build many models and each model can detect one specific level of the output.
Here is why: Think about you have a 100 side-dice, and you know the true distribution. Where $P(S_1)=0.1$, and $P(S_2)=P(S_3)=P(S_{100})=0.9/99=0.009090$. Now what you do with Maximum a posteriori estimation? You will always guess you get the first side $S_1$, since it has largest probability comparing to others. However you will get wrong $90\%$ of the times!!
For details, please check my answers in this post