I have fit a logistic regression where the response variable is binary - whether an interview candidate got the position or not - and the independent variables are a combination of continuous, categorical, and binary variables. In order to test the assumption of linearity between log-odds and predictors, I carried out a Box-Tidwell test on all the continuous and binary variables, increasing each variable by 1 so that all variables in the Box-Tidwell test are positive.
The results indicated that several of the binary variables have a non-linear relationship with the log odds of the outcome. I want to include this non-linearity in the model - and I want to know what strategies are available to me to do so. So far, I think I can:
- Take the up-shifted binary variables, e.g. where the original binary variable $X_1 \in \{0,1\}$ and $X_1' = X_1 + 1$, then $X_1' \in \{1,2\}$. Then, as with a continuous variable, I could include a polynomial term - so I regress the log-odds of the outcome variable on $X_1' + X_1'^{2}$.
- Apply the same up-shift to the binary variables, but then take the log, i.e. regress $Y$ on $\log(X_1')$.
Are there any other strategies for modelling non-linear effects of binary independent variables? What are the advantages and disadvantages of these strategies?