I have 6 out of 167 cases of cancer in the dependent variable. I would evaluate if three independent variables predicted the cancer. Are 6 cases enough? Is there a rule to determine this? Is there difference if the indipendent variables is nominal or numeric?
Asked
Active
Viewed 148 times
0

Sycorax
- 76,417
- 20
- 189
- 313
-
What does "enough" mean? If I were to say "enough" then I would personally say "given prevalence x, how many samples do I need for the Jeffreys prior on the rate to be within the range x-r to x+r. Here is what a decent tool gives for your samples. (http://epitools.ausvet.com.au/content.php?page=CIProportion&SampleSize=167&Positive=6&Conf=0.95&Digits=5) – EngrStudent Aug 29 '16 at 14:36
-
6 "Yes"s and 3 DVs will almost certainly produce complete separation in the data (which often manifests as huge coefficient estimates and standard errors). The rule of thumb usually says 5 or 10 cases per DV (depending on who you ask). – not_bonferroni Aug 29 '16 at 14:36
-
What do you mean by "6 out of 167"? Is is that 6 patients had cancer diagnosed ? – Tim Aug 29 '16 at 14:36
-
4Related: http://stats.stackexchange.com/questions/26016/sample-size-for-logistic-regression – David R Aug 29 '16 at 14:42
1 Answers
1
I think there is no minimal number of "cases" to perform logistic regression (well at least you need to have 2 classes.).
The only downside is that, if you have a very imbalanced data, you may model minority poorly.
Also you may get some warnings from R
, glm
, if you have perfect separably data, so you may consider adding regularization.