0

Simplified scenario. I want to model income depending on gender and education. I have to possibility using glms:

glm(income ~ gender + education, family = poisson, data=bla) 
glm(income ~ gender + education, family = gaussian, data=bla) 

Is poisson or gaussian a better choice? I read that when there are more than 30 observations gaussian should be used but I thought I better ask the experts.

cs0815
  • 1,294
  • 18
  • 30
  • 2
    I think Gamma GLMs have usually been used to model incomes. Poisson isn't usually a good model since it is only supported on the integers (and income is continuous). If the incomes are large enough the differences between the gamma and the normal may not be very big. Its worth fitting both models and comparing on aspects salient to your study. – Demetri Pananos Feb 06 '21 at 21:22
  • Thanks I will have a look at this. – cs0815 Feb 06 '21 at 21:24
  • 1
    Additionally, the "30 or more observations" rule of thumb you quote is a perversion of the t-test rule of thumb for using z quantiles as the critical value. It does not apply to GLMs, and is wrong so far as I am concerned. – Demetri Pananos Feb 06 '21 at 21:56

1 Answers1

0

Maybe some economist or statistician here could give a theoretical answer, but I would tackle the problem empirically, by fitting both the models and checking how they perform in terms of residuals, metrics such as AIC/BIC, and using validation procedures such as cross-validation.

N9N9
  • 97
  • 1
  • 7
  • Thanks. Yeah thought about aic comparison or anova – cs0815 Feb 06 '21 at 21:14
  • 1
    I feel very strongly that family selection for GLM is not something that can be done via AIC or other information criteria. Family is a choice which is made prior to modelling. [See this answer](https://stats.stackexchange.com/a/469374/111259) here for more. – Demetri Pananos Feb 06 '21 at 21:54