I have three predictors and one response. What can I do if my response variable is binary?
I need both quadratic and linear coefficients in a GLM with binary response. What's the best option?
Asked
Active
Viewed 1.8k times
7

gung - Reinstate Monica
- 132,789
- 81
- 357
- 650

BabuRam Paudel
- 71
- 1
- 1
- 2
-
This question appears to be off-topic because it is about how to use r without a reproducible example. – gung - Reinstate Monica Dec 29 '14 at 02:57
-
2Your question is a little unclear. You'd be doing better to clarify what your variables are, describe how the quadratic coefficient fits into your model and reframe your question to more clearly ask about the statistical aspects of your problem before relating it to R. If you do want to also ask about implementing the model in R, a reproducible example as gung mentioned (i.e. a small data set and code we can copy and paste to create it, as well as what you're currently trying to do) would be important. How did this question arise? – Glen_b Dec 29 '14 at 03:12
-
Thanks everybody for your reply.The details of my problem is as follows: My objective is Interpretation of the coefficients but I am confused between glm(y~x+I(x^2), family = gaussian) # non-orthogonal polynomial or glm(y~poly(x,2), family = gaussian) # orthogonal. Is there any specification between these two. – BabuRam Paudel Dec 29 '14 at 14:42
1 Answers
10
You can add a quadratic term with logistic regression just as you can with regular old linear regression. That is a simple way to include a 'curve' in your model. Be sure you understand what that means. I suspect you want an R tutorial, which is off-topic on CV. The basic approach to adding a quadratic in R is to include I(x^2)
in the formula. Here is a simple example:
lo.to.p = function(lo){ # we need this function to generate the data
odds = exp(lo)
prob = odds/(1+odds)
return(prob)
}
set.seed(4649) # this makes the example exactly reproducible
x1 = runif(100, min=0, max=10) # you have 3, largely uncorrelated predictors
x2 = runif(100, min=0, max=10)
x3 = runif(100, min=0, max=10)
lo = -78 + 35*x1 - 3.5*(x1^2) + .1*x2 # there is a quadratic relationship w/ x1, a
p = lo.to.p(lo) # linear relationship w/ x2 & no relationship
y = rbinom(100, size=1, prob=p) # w/ x3
model = glm(y~x1+I(x1^2)+x2+x3, family=binomial)
summary(model)
# Call:
# glm(formula = y ~ x1 + I(x1^2) + x2 + x3, family = binomial)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -1.74280 -0.00387 0.00000 0.04145 1.74573
#
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -53.65462 19.65288 -2.730 0.00633 **
# x1 24.78164 8.92910 2.775 0.00551 **
# I(x1^2) -2.49888 0.89344 -2.797 0.00516 **
# x2 0.03318 0.20198 0.164 0.86952
# x3 -0.09277 0.18650 -0.497 0.61890
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 128.207 on 99 degrees of freedom
# Residual deviance: 18.647 on 95 degrees of freedom
# AIC: 28.647
#
# Number of Fisher Scoring iterations: 10

gung - Reinstate Monica
- 132,789
- 81
- 357
- 650
-
Thanks everybody for your reply.The details of my problem is as follows: My objective is Interpretation of the coefficients but I am confused between glm(y~x+I(x^2), family = gaussian) # non-orthogonal polynomial or glm(y~poly(x,2), family = gaussian) # orthogonal. Is there any specification between these two. for my objective which model best describes – BabuRam Paudel Dec 29 '14 at 14:48
-
3@BabuRamPaudel, you should not use `family = gaussian` with a binary response variable; you should use `family=binomial` instead. As for the question of `I(x^2)` vs `poly(x,2)`, it doesn't really make much difference ultimately, but it does make a difference to what your output looks like, & how to interpret the numbers. There is a good discussion you should read here: [How to interpret coefficents from a polynomial model fit?](http://stats.stackexchange.com/q/95939/) – gung - Reinstate Monica Dec 29 '14 at 14:55
-
When considering x1 and x1^2, which one should be taken into account when interpretating the results? Are both relevant, or only one should be considered? – mtao Mar 01 '17 at 17:05
-
@Teresa, it may help you to read my answer here: [Does it make sense to add a quadratic term but not the linear term to a model?](http://stats.stackexchange.com/a/28750/7290) – gung - Reinstate Monica Mar 01 '17 at 17:48