Questions tagged [naive-bayes]

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".

Questions about using, optimizing, or interpreting a naive Bayes classifier should use this tag.

Wikipedia's summary:

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".

Wikipedia's introduction:

In simple terms, a naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the presence or absence of the other features.

For some types of probability models, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods.

Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian classification problem showed that there are sound theoretical reasons for the apparently implausible efficacy of naive Bayes classifiers [(Zhang, 2004)]. Still, a comprehensive comparison with other classification algorithms in 2006 showed that Bayes classification is outperformed by other approaches, such as boosted trees or random forests [(Caruana & Niculescu–Mizil, 2006)].

An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

We can visualize a naive Bayes graphically as follows:

generic naive Bayes model

In this Bayesian network, predictive attributes $X_i$ are conditionally independent given the class $C$.

References:

  • Caruana, R., & Niculescu–Mizil, A. (2006). An empirical comparison of supervised learning algorithms. Proceedings of the 23rd International Conference on Machine Learning, 161–168. Available online, URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.5901.

  • Domingos, P. & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–137.

  • Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006). Spam filtering with naive Bayes—which naive Bayes? Third Conference on Email and Anti-Spam (CEAS), 17.

  • Rennie, J., Shih, L., Teevan, J., & Karger, D. (2003). Tackling the poor assumptions of naive Bayes classifiers. Proceedings of the Twentieth International Conference on Machine Learning. Available online, URL: http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.

  • Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. Available online, URL: http://www.research.ibm.com/people/r/rish/papers/RC22230.pdf.

  • Zhang, H. (2004). The optimality of naive Bayes. FLAIRS2004 conference. American Association for Artificial Intelligence. Available online, URL: http://www.cs.unb.ca/profs/hzhang/publications/FLAIRS04ZhangH.pdf.

586 questions
52
votes
3 answers

Understanding Naive Bayes

From StatSoft, Inc. (2013), Electronic Statistics Textbook, "Naive Bayes Classifier": To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the illustration above. As indicated, the objects can be…
G Gr
  • 981
  • 2
  • 8
  • 15
45
votes
2 answers

Difference between naive Bayes & multinomial naive Bayes

I've dealt with Naive Bayes classifier before. I've been reading about Multinomial Naive Bayes lately. Also Posterior Probability = (Prior * Likelihood)/(Evidence). The only prime difference (while programming these classifiers) I found between…
garak
  • 2,033
  • 4
  • 26
  • 31
40
votes
3 answers

Why do naive Bayesian classifiers perform so well?

Naive Bayes classifiers are a popular choice for classification problems. There are many reasons for this, including: "Zeitgeist" - widespread awareness after the success of spam filters about ten years ago Easy to write The classifier model is…
winwaed
  • 1,103
  • 1
  • 9
  • 11
40
votes
3 answers

How is Naive Bayes a Linear Classifier?

I've seen the other thread here but I don't think the answer satisfied the actual question. What I have continually read is that Naive Bayes is a linear classifier (ex: here) (such that it draws a linear decision boundary) using the log odds…
Kevin Pei
  • 749
  • 2
  • 9
  • 13
35
votes
3 answers

What algorithms need feature scaling, beside from SVM?

I am working with many algorithms: RandomForest, DecisionTrees, NaiveBayes, SVM (kernel=linear and rbf), KNN, LDA and XGBoost. All of them were pretty fast except for SVM. That is when I got to know that it needs feature scaling to work faster. Then…
Aizzaac
  • 989
  • 2
  • 11
  • 21
35
votes
8 answers

In Naive Bayes, why bother with Laplace smoothing when we have unknown words in the test set?

I was reading over Naive Bayes Classification today. I read, under the heading of Parameter Estimation with add 1 smoothing: Let $c$ refer to a class (such as Positive or Negative), and let $w$ refer to a token or word. The maximum likelihood…
24
votes
3 answers

Machine Learning to Predict Class Probabilities

I am looking for classifiers that output probabilties that examples belong to one of two classes. I know of logistic regression and naive Bayes, but can you tell me of others that work in a similar way? That is, classifiers that predict not the…
21
votes
2 answers

How does Naive Bayes work with continuous variables?

To my (very basic) understanding, Naive Bayes estimates probabilities based on the class frequencies of each feature in the training data. But how does it calculate the frequency of continuous variables? And when doing prediction, how does it…
xyy
  • 411
  • 1
  • 5
  • 7
18
votes
5 answers

How to do one-class text classification?

I have to deal with a text classification problem. A web crawler crawls webpages of a certain domain and for each webpage I want to find out whether it belongs to only one specific class or not. That is, if I call this class Positive, each crawled…
pemistahl
  • 445
  • 1
  • 4
  • 12
17
votes
1 answer

When does Naive Bayes perform better than SVM?

In a small text classification problem I was looking at, Naive Bayes has been exhibiting a performance similar to or greater than an SVM and I was very confused. I was wondering what factors decide the triumph of one algorithm over the other. Are…
Legend
  • 4,232
  • 7
  • 37
  • 50
16
votes
3 answers

In Kneser-Ney smoothing, how are unseen words handled?

From what I have seen, the (second-order) Kneser-Ney smoothing formula is in some way or another given as $ \begin{align} P^2_{KN}(w_n|w_{n-1}) &= \frac{\max \left\{ C\left(w_{n-1}, w_n\right) - D, 0\right\}}{\sum_{w'} C\left(w_{n-1}, w'\right)} +…
16
votes
3 answers

Example of how the log-sum-exp trick works in Naive Bayes

I have read about the log-sum-exp trick in many places (e.g. here, and here) but have never seen an example of how it is applied specifically to the Naive Bayes classifier (e.g. with discrete features and two classes) How exactly would one avoid the…
Josh
  • 3,408
  • 4
  • 22
  • 46
15
votes
3 answers

Why does nobody use the Bayesian multinomial Naive Bayes classifier?

So in (unsupervised) text modeling, Latent Dirichlet Allocation (LDA) is a Bayesian version of Probabilistic Latent Semantic Analysis (PLSA). Essentially, LDA = PLSA + Dirichlet prior over its parameters. My understanding is that LDA is now the…
15
votes
1 answer

Why is the naive bayes classifier optimal for 0-1 loss?

The Naive Bayes classifier is the classifier which assigns items $x$ to a class $C$ based on the maximizing the posterior $P(C|x)$ for class-membership, and assumes that the features of the items are independent. The 0-1 loss is the loss which…
user24544
15
votes
2 answers

Increasing number of features results in accuracy drop but prec/recall increase

I am new to Machine Learning. At the moment I am using a Naive Bayes (NB) classifier to classify small texts in 3 classes as positive, negative or neutral, using NLTK and python. After conducting some tests, with a dataset composed of 300,000…
1
2 3
39 40