6

I am looking at the results of a logistic regression model (i dont have the data) and the person who has developed the model has included quadratic terms in the model.

I understand the use of such polynomial terms in a linear model where one can look at the relationship between the response and the predictor. But in case of a binary outcome, is there a way to identify such a trend before hand i.e. without including it in the model and then checking if the variable is significant or not?

DWin
  • 7,005
  • 17
  • 32
Raj
  • 753
  • 2
  • 11
  • 18
  • There is no way to separately interpret a coefficient for a "quadratic term". It is essentially an interaction and requires that you make a prediction across the range of possible values using all terms that contribute to the variation of the outcome. The notion of a quadratic terms for a binary variable isn't making a lot of sense to me. – DWin Mar 07 '16 at 00:28
  • 1
    @DWin I believe the OP means a quadratic expansion of a continuous covariate in logistic regression. In others, if $x$ is a predictor of $y$, include $x$ and $x^2$ as two covariates. – Cliff AB Mar 07 '16 at 03:45
  • That is what I interpreted the question as requesting. But I hope you agree that it remains unclear what a "quadratic trend" might mean. There's no way to decide whether a positive sign for a squared term actually means a concave upward "trend" in the region of domain of interest. My main point is that one needs to _predict_ using the (Intercep, linear and quadratic coefficients together over the range of interest. Attempting to assign meaning to individual coefficients is foolish. – DWin Mar 07 '16 at 04:55
  • @DWin: I agree the word "trend" is a little misleading, but like Ciff has mentioned what i meant was how does one figure out whether to add a x^2 or X^n predictor in the model if we can't get a good feel of the relationship between the predictor and outcome. I get your point of trial and check but my doubt is what would prompt someone to try a polynomial covariate? – Raj Mar 07 '16 at 15:49

2 Answers2

3

During EDA, you can take the (continuous) predictor and discretize it by either creating equal-sized or equal-spaced bins. Then you can plot the event rates across all bins to visually detect a linear or quadratic relationship (if it exists). E.g., an inverted U-shaped curve would suggest the presence of a quadratic relationship. Another way to create such bins is by using CHAID (or other) decision tree algorithm to split your sample into statistically-derived bins.

Vishal
  • 1,134
  • 9
  • 14
3

After building a model based on general linear model (as you would typically be doing when you have a binary outcome), you have several methods available for checking for violations of the assumptions supporting statistical validity. The assumption that would be violated when the prediction relationship was polynomial is linearity of residuals (or equivalently the prediction-vs-predictor on the fitted scale, logistic in the case of binary outcomes). The details will vary depending on your computing platform, but you should be thinking of residual (or fitted values) versus predictor plots. The test for needing a polynomial would be "eyeball"-driven. If you get a "smile" or a "frown" then a squared term might be appropriate. If you get a minus-plus-minus-plus sort of pattern then a higher order polynomial might be needed. You should be thinking about the underlying scientific implications during this process of model building. Cubic polynomials should have a higher degree of skepticism. You need to balance the degree of fit against complexity. The other approach is to use regression splines which allow an automatic penalty to be imposed. Frank Harrell's "Regression Modeling Strategies" has many worked examples using the S/R platform.

DWin
  • 7,005
  • 17
  • 32