Logistic sample and case numbers

Question

I have some questions about binary logistic regression. For my research, I am planning to use 12 predictors, and my sample consists of 129 cases. However, I know of a 1 to 10 rule.

Additionally, my DV is divided into 2 groups (it is a binary variable). It has an unbalanced distribution, and one group has less than 30 cases (A group: 104; B group: 25).

In this situation, can I run logistic regression?

See https://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression for the question about unbalanced samples — kjetil b halvorsen, Aug 31 '18 at 19:16

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

That rule of thumb uses the number of cases in the minority group, in this case 25 in B. So you have about 2 events per variable^†: you can still use logistic regression, but your model is likely to grossly over-fit the sample data—use bootstrap validation or cross-validation to estimate the consequences (or use the heuristic shrinkage estimator described here).

Collecting more data would be a very good idea. You might want to reflect that the 95% confidence interval for the overall proportion of B is roughly (0.13,0.26), so you can't hope to learn about the individual effects of each of those 12 predictors with much precision at all. If you really need to build a predictive model on this sample, carrying out data reduction on the predictors would be sensible—try to get the number down to about two or three. Regularization is an alternative way to improve predictive accuracy.

† It's in fact events per regression degree of freedom you need to consider; so each dummy variable for a categorical predictor, each polynomial or spline term, counts.

Thanks for your answer. can I ask you one more thing? In my model, depedent variable is divided into two groups(1': 25, 0': 104). According your comment, the rule of thumb is used the cases(n=25). so, I can't use 12 predictors. However, if I have weight variable(my data is derived from pannel data), then, can I use 12 predicotrs? when I conduct frequency analysis with weight variable, my sample size is over 20,000. — user49274, Jul 01 '14 at 18:36
Can you clarify what you mean by a weight variable? See the distinction between survey weights & frequency weights discussed [here](http://stats.stackexchange.com/questions/22989/). For frequency weights you do in fact have a greater sample size. — Scortchi - Reinstate Monica, Jul 04 '14 at 15:01

Logistic sample and case numbers

1 Answers1