2

I am trying to include several income types into a regression model (specifically a logit model). This variable has the particularity to have numerous zero for some type of incomes (typically some capital incomes). How do I choose between

  • including a dummy variable of whether the individual has this particular type of income or not
  • including a variable that accounts for the amount of income the person has (either the plain value or perhaps some quantiles).

I cannot include both due to collinearity, so how to choose? Should I use rather use a criterion like $R^2$, or AIC, BIC, or something like whether one is significant or not, etc?

And additionaly, is there a way to include both in a model?

ilanman
  • 4,503
  • 1
  • 22
  • 46
Anthony Martin
  • 1,109
  • 3
  • 11
  • 26
  • 1
    Even if they are correlated you can still include both in the model and that seems to do more justice to the special status of zero for this variable. – mdewey Jan 04 '17 at 13:41
  • If I try to incorporate both the fact that you own or not a certain type of income and then let say the quintiles of this income, then they are perfectly correlated for at least one category. Say I exclude the middle quintile, if you have detention=1 and you are not in q1 q2 q4, you are in q5 ? At least that is how I understand it, and what SAS tells me. – Anthony Martin Jan 09 '17 at 12:49
  • 1
    Try using income not a categorised version. – mdewey Jan 09 '17 at 13:10

1 Answers1

0

If you have the income numerically, why would you want to use only quintiles, that is missing information. Try to use the income directly.

When a certain income category is zero for a specific person, maybe think of that as that income category being undefined for that person (as income from shares for persons not investing in shares.) Then you could use the ideas from How do you deal with "nested" variables in a regression model?.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467