0

I recently ran two tests in R - one using glm() and one using lm() with the goal being to test the relationship between a binary response and binary predictor. I ran glm() first and got an estimate of -0.68 for the predictor coefficient which I thought was pretty good. P<.05 and AIC of 653.

When I ran lm() however I got an estimate of -.14, a multiple r-squared of .008, P<.05.

My understanding is that linear regression is usually a poor choice for a categorical response compared with logistic regression, but when is this not the case? I noticed in this post http://statisticalhorizons.com/linear-vs-logistic that the author states there's middle ground where it does make sense to use linear regression. Are there any common rules (or rules of thumb you personally use) that determine when to try out linear regression on a categorical response? Do any of these differ from the author's cases?

114
  • 701
  • 6
  • 15
  • @whuber I don't believe this is a duplicate. I've edited my original post to try to clarify. – 114 Jul 08 '15 at 19:54
  • I must be missing something, because I still don't see the distinction. I have read the blog post you now reference (and find it to be erroneous, fwiw, because it fails to consider the differences in the underlying probability models). I notice that its author has recently attempted to answer the question I marked as a duplicate by referencing his blog post! That seems to suggest you are asking exactly the same question. Incidentally, if that blog post is correct and applicable to your data then you should be getting comparable predictions in your two models. – whuber Jul 08 '15 at 20:01
  • @whuber Understandable, I suppose there really isn't enough difference between them. I initially thought that was just dealing with the standard use cases for the two rather than whether one could be used for the other. Plus, I think the fact that you find the blog post erroneous and that the results should be similar if it even applies helps clear up my concerns anyway. – 114 Jul 08 '15 at 20:06
  • I would like to see this issue get more air time here. If you can think of a way to distinguish your question more clearly from the other one it would be nice. Maybe you could provide some more details of your data and analyses, or maybe you could refine the duplicate--which is broad--into a more focused one, such as asking exactly how to recognize when `lm` and `glm` ought to give similar results or how exactly to compare the output of one with the output of the other. It all depends on where your interest lies. – whuber Jul 08 '15 at 20:08

0 Answers0