Use different Naive Bayes classifiers to target different data

Question

I am practicing using the Naive Bayes classifier to predict whether people get a stroke or not, but, I am confused with two classifiers. One is categorical Naive Bayes, another is Gaussian Naive Bayes.

For example, in the dataset, there are several text attributes such as gender, ever_married, and ever_smoked. Some of the columns are numerical data. For standardization, I use dummies like sex = pd.get_dummies(df['gender'],drop_first=True) to transform the text to binary, and then standardized the dataset and use the Gaussian Naive Bayes classifier to train the data. Is this the correct way to do it?

Or should I directly use the Categorical Naive Bayes to train the data? However, some columns are numerical that is not reasonable to use this classifier?

Any help is highly appreciated.

score 2 · Accepted Answer · answered Aug 19 '20 at 11:14

First, the term 'Naive Bayes' refers to the made assumption of conditional independence among feature variables, given the class outcome (that is, 'stroke' or 'no-stroke'). Taking the variables gender and ever_smoked, conditional independence is written as $Gender \; INDEP \; EverSmoked \; \mid \; Stroke$. Conditional independence can hold also for numeric variables.

Your two variables Gender and EverSmoked are categorical so a discrete classifier is appropriate for your purpose (you can try the off-the-shelf webservice Insight Classifiers, which copes also with numeric variables, all in one go).

In general, (deep) neural networks, support vector machines and decision trees (C4.5) easily combine discrete and continuous feature variables.

Thank you for your explanation and suggestion. – Woden Aug 19 '20 at 11:21 — Woden, Aug 19 '20 at 11:21

Use different Naive Bayes classifiers to target different data

1 Answers1