I know everybody uses logistic regression as the starting point, but I'm curious to know: What are the other commonly used predictive models when data is primarily binary?
-
A good idea to search the site as well: e.g. see the splendid http://stats.stackexchange.com/questions/20523/difference-between-logit-and-probit-models – Nick Cox Nov 08 '13 at 20:22
2 Answers
The common methods would be: (1) Logistic regression (gold standard) (2) Regression tree (really only for exploratory analysis. Very easy to interpret, but with increased bias and variance) (3) Neural network (4) Ensemble method (booster regression trees or random forest) (5) Linear Discriminant Analysis / Quadratic discriminant analysis (6) KNN (7) Generalized Additive models (8) Support vector machines

- 244
- 3
- 4
-
2
-
@NeilG: With less than 15 reputation he can't upvote answers (though he can accept one). – Scortchi - Reinstate Monica Feb 08 '14 at 23:24
When the data is entirely binary I'd say association rule learning (aka affinity analysis or market basket analysis) and then learning a decision tree based on the result (a whole bunch of association rules).
Association rule learning attempts to find associations between predictors. The result of such an analysis is a set of rules (e.g. A ^ B) with an associated support (number of occurrences) and confidence. The amount of possible rules is exponential in terms of the amount of predictors and maximum rule length.
Subsequently it's common to learn models like decision trees from this (giant) set of rules.

- 17,399
- 1
- 49
- 70