Bayesian logistic regression: mixed categorical and continuous predictors

Question

I'm trying to model the probability of an event Y based on three independant variables, one (X) is continuous (a log count) and the others (A and B) are categorical (nominal). B is a subcategory of A. A has 4 levels, most of all are well populated, B has 3 to 15 levels, depending of level of A, and about half are well populated.

I could take all my three variables and do a Bayesian logistic regression (one-hot encoding for A and B, ending up with 1+4+15 columns). I could also proceed by steps: four distinct models/logistic regressions of Y based on X, one for each level of A. Then using the coefficients of each as priors on X, do logistic regressions Y ~ X on each level of B (if levelBj of B belongs to levelAi of A then I use the priors of model i above).

Does it make sense to proceed that way? Are there advantages/disadvantages doing that? Are there alternatives? Any links/tutorials on the problem of mixing categorical and continuous variables for bayesian logistic regression are also appreciated (particularly for PyMC3).

score 1 · Accepted Answer · answered Feb 11 '21 at 08:45

Short answer: Your second proposal seems strange to me, go for the simpler first proposal. There is no problem in principle with mixing different kinds of variables as predictors in a regression. If the range of the continuous variable is not small, consider to spline it (or simpler, a quadratic polynomial model). For the categorical variable with some sparse levels, maybe regularization, see Principled way of collapsing categorical variables with many levels?

Bayesian logistic regression: mixed categorical and continuous predictors

1 Answers1