1

I am analyzing survey data (n=60, regarding their risk/return expectations, which is my dummy variable in the model, the problem is that 95% of the sample are located in one group of the dummy variable. I ran this multiple logit model, where i control for other risk/return expectations (ESG and Mrisk both dummy variables) as well as for gender (dummy), age (continuous) and education (dummy).

Now can I use these results for an interpretation? if not what can I do to control for the dependent variables?

Many thanks in advance!

Grisk5.i <- glm(Grisk~ESG + Mrisk + gender + age + edu_work.IS, data = Data_IS,binomial)

glm(formula = Grisk ~ ESG + Mrisk + gender + age + edu_work.IS, 
    family = binomial, data = Data_IS)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3480  -0.2356  -0.1851  -0.1515   2.5173  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)     -23.07030 3556.12539  -0.006   0.9948  
ESGB              0.37620    1.42856   0.263   0.7923  
MriskB            3.11528    1.31539   2.368   0.0179 *
genderMale       17.34287 3556.12383   0.005   0.9961  
age               0.05564    0.07059   0.788   0.4306  
edu_work.ISlow   -0.37620    1.58109  -0.238   0.8119  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 29.392  on 59  degrees of freedom
Residual deviance: 19.821  on 54  degrees of freedom
AIC: 31.821
Number of Fisher Scoring iterations: 18 ```



1 Answers1

2

For a GLM your problem isn't the imbalance per se, but rather the low frequency of the less frequent outcome given your total # of observations.

5% of 60 is only 3 observations; a common rule of thumb for logistic regression is to have 10 observations of the least common outcome per predictor (others suggest more; there are several Q&A about this here including Sample size for logistic regression? ), so you don't have enough to meaningfully evaluate even one predictor variable.

Bryan Krause
  • 1,414
  • 6
  • 15
  • thank you for your answer, do you, by chance know a different statistic method, where i can interpret the influence age, gender, etc has on my dependent variable (the ris/return expectation) – Antonio Thurnher May 04 '20 at 21:36
  • @AntonioThurnher You don't have enough data for any statistical method. – Bryan Krause May 04 '20 at 21:50
  • Hi @BryanKrause, another question, can i, however, use the variable with the very uneven groups (the dependent variable in the model above) as a predictor in a different model ? or will my results be again meaningless? – Antonio Thurnher May 05 '20 at 11:53