I've been asked to use the Naive Bayes classifier to classify a couple of samples.
My dataset had categorical features so I had to first encode them using a one-hot encoder, but then I was at a loss as for which statistical model to use (e.g. Gaussian NB, Multinomial NB).
I ended up using the multinomial version because I read somewhere that it worked well in NLP and IR tasks due to documents being represented as term-count vectors or TF-IDF weights.
I would like to know if that was correct and, if possible, a quick explanation on why that is so.
PS There is this somewhat similar question, but I'm not sure whether that also applies to strictly binary (0 or 1) feature vectors.