Large standard errors in Logistic Regression Model

Question

Competing Logistic Regression Model results

Below you will see a screen grab off the tail of my results, where you get to my control variables. In short, I ran 4 competing models to explain a phenomenon.

The Dataset was originally very large - however, in order to make the AIC and BIC comparable to one another, I decided to put them all into a data frame using the Complete.cases function. There The result is that the sample size is vastly reduced to 240 and many coefficients and Standard Errors are large.

Actually, that's an understatement - the standard errors are very large indeed. Sorry to ask a simple question but do I have problems here?

Henry

Combinedframe <- data.frame(bes$influence,bes$groupbenefits,bes$satisfaction,bes$guilty,bes$civicduty,bes$democracy,bes$personalbenefits,bes$toobusy,bes$age,bes$gender,bes$ethnic,bes$religion,bes$education,bes$labourcrime,bes$laboureducation,bes$labourimmigration,bes$labournhs,bes$labourterrorism,bes$labourecon,bes$Govfair,bes$minor,bes$demosat,bes$persretro,bes$natretro,bes$perspro,bes$natpro,bes$interest,bes$attention,bes$newspaper,bes$contacted,bes$asked,bes$know1,bes$know2,bes$know3,bes$know4,bes$know5,bes$know6,bes$know7,bes$know8,bes$vote,bes$realchoice,bes$goodcand,bes$sayvsdo,bes$brown,bes$cameron,bes$clegg,bes$browncomp,bes$cameroncomp,bes$cleggcomp,bes$attachment,bes$fairelections,bes$trustwestminster,bes$trustparties)
Combined <- Combinedframe[complete.cases(Combinedframe),]
View(Combined)

#################COMBINED DATA FRAME MODELS####################

Model1b <- glm(bes.vote~ bes.influence+bes.groupbenefits+bes.satisfaction+bes.guilty+bes.civicduty+bes.democracy+bes.personalbenefits+bes.toobusy+bes.age+bes.gender+bes.ethnic+factor(bes.education)+bes.religion, data=Combined, family=binomial)
summary(Model1b)

Model2b <- glm(bes.vote~ bes.labourcrime+bes.laboureducation+bes.labourimmigration+bes.labournhs+bes.labourterrorism+bes.labourecon+bes.Govfair+bes.minor+bes.demosat+bes.persretro+bes.natretro+bes.perspro+bes.natpro+bes.gender+bes.age+bes.ethnic+factor(bes.education)+bes.religion, data= Combined, family=binomial)
summary(Model2b)

Model3b <- glm(bes.vote~ bes.interest+bes.attention+bes.newspaper+bes.contacted+bes.asked+bes.know1+bes.know2+bes.know3+bes.know4+bes.know5+bes.know6+bes.know7+bes.know8+bes.age+bes.gender+bes.ethnic+factor(bes.education)+bes.religion, data= Combined, family=binomial)
summary(Model3b)

What software are you using? Can you post the syntax? How large is "very large"? — robin.datadrivers, Mar 18 '15 at 13:25
Also keep in mind that it is possible that you have missing data not completely at random, so chopping your full dataset down to 240 could mean your results are no longer valid in your full sample, and your model comparisons will only have meaning on that smaller sample. Have you considered imputation to fill in the variables you have missing values for? You could also try something like full info MLE. — robin.datadrivers, Mar 18 '15 at 13:27
most likely Hauck-Donner effect/complete separation (search elsewhere on CV ... e.g. http://stats.stackexchange.com/questions/46223/ci-for-logistic-regression/46581#46581 ) — Ben Bolker, Mar 18 '15 at 13:27
Thanks all. @robin.datadrivers I'm using R and have edited my post to include code. When I say very large, its the British Election study so a couple of thousand perhaps. What is imputation and how can I use it? Thanks — HenryBukowski, Mar 18 '15 at 13:46

RegressForward · Answer 1 · 2015-03-18T15:05:36.890

I would wager that something has actually went wrong. If your SE are that big, you may be looking at a sample with only a trivial number of instances showing variation. If your data set is only 240, I would manually check to see if we observe meaningful variation in your data along this dimensions in terms of results. I would anticipate that you do not have meaningful number of both successes and failures for each level of education.

As a consequence of this multicolinearity, you are probably in the tails of your logistic function. The functional form of logistic will take a large adjustment to move from 99.99% to 99.999%.

Also, check to make sure one input is not like...error code -999 or something. Sometimes this slips by.

Large standard errors in Logistic Regression Model

1 Answers1