1

I am trying to predict the result of an experiment (binary dependent variable) based on a number of continuous independent variables. When I do this using a largish model (9 main effects + 2 factor interaction) it seems to "work" meaning I get a plot where the predicted values are 0/1. However, if I reduce the model to say just two main effects + interaction, I get predicted values ranging from 0 - 1 (and several points in between). I have tried using probit and logit with glm() in R.

With the smaller model I get no warnings / errors. With the larger one I get:

Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 

What causes this behaviour?

How can I determine which of the main effects and/or interaction terms are the most important (they all have *** in the summary)?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Zack Newsham
  • 675
  • 1
  • 8
  • 12

1 Answers1

3

These are frequently asked questions. You should spend some time reading through related threads on CV and learning more about statistics and logistic regression.

Linear (OLS) regression predicts $\hat \mu_{x_i}$, the mean of the distribution of $Y$ when $X=x_i$. Logistic regression predicts $\hat\pi_{x_i}$, the probability of 'success' when $X=x_i$. It is supposed to give predicted probabilities. If it doesn't give predicted probabilities, something has gone wrong.

In your case, you have perfect separation. You may want to read this answer by @scortchi: How to deal with perfect separation in logistic regression?

Regarding your question about how to tell which variables "are the most important", this largely cannot be done. I discuss the issue at the bottom of my answer here: Multiple linear regression for hypothesis testing.

The issue of which link function to use is orthogonal to the problems you are facing. However, if you want to get a better understanding of them, and perhaps even logistic regression in general, it may help you to read my answer here: Difference between logit and probit models.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650