I need both quadratic and linear coefficients in a GLM with binary response. What's the best option?

Question

I have three predictors and one response. What can I do if my response variable is binary?

This question appears to be off-topic because it is about how to use r without a reproducible example. — gung - Reinstate Monica, Dec 29 '14 at 02:57
Your question is a little unclear. You'd be doing better to clarify what your variables are, describe how the quadratic coefficient fits into your model and reframe your question to more clearly ask about the statistical aspects of your problem before relating it to R. If you do want to also ask about implementing the model in R, a reproducible example as gung mentioned (i.e. a small data set and code we can copy and paste to create it, as well as what you're currently trying to do) would be important. How did this question arise? — Glen_b, Dec 29 '14 at 03:12
Thanks everybody for your reply.The details of my problem is as follows: My objective is Interpretation of the coefficients but I am confused between glm(y~x+I(x^2), family = gaussian) # non-orthogonal polynomial or glm(y~poly(x,2), family = gaussian) # orthogonal. Is there any specification between these two. — BabuRam Paudel, Dec 29 '14 at 14:42

gung - Reinstate Monica · Answer 1 · 2014-12-29T14:39:22.527

You can add a quadratic term with logistic regression just as you can with regular old linear regression. That is a simple way to include a 'curve' in your model. Be sure you understand what that means. I suspect you want an R tutorial, which is off-topic on CV. The basic approach to adding a quadratic in R is to include I(x^2) in the formula. Here is a simple example:

lo.to.p = function(lo){                 # we need this function to generate the data
  odds = exp(lo)
  prob = odds/(1+odds)
  return(prob)
}
set.seed(4649)                          # this makes the example exactly reproducible
x1 = runif(100, min=0, max=10)          # you have 3, largely uncorrelated predictors
x2 = runif(100, min=0, max=10)
x3 = runif(100, min=0, max=10)
lo = -78 + 35*x1 - 3.5*(x1^2) + .1*x2   # there is a quadratic relationship w/ x1, a
p  = lo.to.p(lo)                        #  linear relationship w/ x2 & no relationship
y  = rbinom(100, size=1, prob=p)        #  w/ x3

enter image description here

model = glm(y~x1+I(x1^2)+x2+x3, family=binomial)
summary(model)
# Call:
# glm(formula = y ~ x1 + I(x1^2) + x2 + x3, family = binomial)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -1.74280  -0.00387   0.00000   0.04145   1.74573  
# 
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept) -53.65462   19.65288  -2.730  0.00633 **
# x1           24.78164    8.92910   2.775  0.00551 **
# I(x1^2)      -2.49888    0.89344  -2.797  0.00516 **
# x2            0.03318    0.20198   0.164  0.86952   
# x3           -0.09277    0.18650  -0.497  0.61890   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 128.207  on 99  degrees of freedom
# Residual deviance:  18.647  on 95  degrees of freedom
# AIC: 28.647
# 
# Number of Fisher Scoring iterations: 10

Thanks everybody for your reply.The details of my problem is as follows: My objective is Interpretation of the coefficients but I am confused between glm(y~x+I(x^2), family = gaussian) # non-orthogonal polynomial or glm(y~poly(x,2), family = gaussian) # orthogonal. Is there any specification between these two. for my objective which model best describes — BabuRam Paudel, Dec 29 '14 at 14:48
@BabuRamPaudel, you should not use `family = gaussian` with a binary response variable; you should use `family=binomial` instead. As for the question of `I(x^2)` vs `poly(x,2)`, it doesn't really make much difference ultimately, but it does make a difference to what your output looks like, & how to interpret the numbers. There is a good discussion you should read here: [How to interpret coefficents from a polynomial model fit?](http://stats.stackexchange.com/q/95939/) — gung - Reinstate Monica, Dec 29 '14 at 14:55
When considering x1 and x1^2, which one should be taken into account when interpretating the results? Are both relevant, or only one should be considered? — mtao, Mar 01 '17 at 17:05
@Teresa, it may help you to read my answer here: [Does it make sense to add a quadratic term but not the linear term to a model?](http://stats.stackexchange.com/a/28750/7290) — gung - Reinstate Monica, Mar 01 '17 at 17:48

I need both quadratic and linear coefficients in a GLM with binary response. What's the best option?

1 Answers1

Related