Questions tagged [laplace-smoothing]

Laplace smoothing (also known as additive smoothing) is a technique associated with a probability regularisation task. It ensures that certain improbable outcomes are still associated with a minimum probability of occurrence.

43 questions
35
votes
8 answers

In Naive Bayes, why bother with Laplace smoothing when we have unknown words in the test set?

I was reading over Naive Bayes Classification today. I read, under the heading of Parameter Estimation with add 1 smoothing: Let $c$ refer to a class (such as Positive or Negative), and let $w$ refer to a token or word. The maximum likelihood…
12
votes
2 answers

Laplace smoothing and Dirichlet prior

On the wikipedia article of Laplace smoothing (or additive smoothing), it is said that from a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter…
11
votes
3 answers

Terminology for Bayesian Posterior Mean of Probability with Uniform Prior

If $p \sim$ Uniform$(0,1)$, and $X \sim$ Bin$(n, p)$, then the posterior mean of $p$ is given by $\frac{X+1}{n+2}$. Is there a common name for this estimator? I've found it solves lots of people's problems and I'd like to be able to point people to…
Cliff AB
  • 17,741
  • 1
  • 39
  • 84
10
votes
2 answers

Calculating Emission Probability values for Hidden Markov Model (HMM)

I'm new to HMM and still learning. I'm currently using HMM to tag part-of-speech. To implement the viterbi algorithm I need transition probabilities ($ a_{i,j} \newcommand{\Count}{\text{Count}}$) and emission probabilities ($ b_i(o) $). I'm…
7
votes
2 answers

What's a good approach to estimate the probability of word frequencies?

I have a document corpus and I want to estimate the probability of occurrence of a certain word $w$. Simply calculating the frequencies and use such a number as an estimation is not a good choice. Is there any work on this topic describing a better…
derekhh
  • 185
  • 1
  • 5
6
votes
2 answers

What if a numerator term is zero in Naive Bayes?

I'm trying to predict the probability that a user will visit a particular website based on several factors (day of the week, time since last visit, etc). My question is what to do if one of the numerator terms goes to zero? For instance, suppose I…
Jeff
  • 3,525
  • 5
  • 27
  • 38
5
votes
0 answers

Laplace smoothing parameter choice for Markov chain transitions

Let $Y_{t}$ be the state of the process at time $t$, ${\bf P}$ be the transition matrix then: $$ {\bf P}_{ij} = P(Y_{t} = j | Y_{t-1} = i) $$ Since this is a Markov chain, this probability depends only on $Y_{t-1}$, so it can be estimated by the…
HCAI
  • 737
  • 2
  • 7
  • 23
5
votes
1 answer

Smoothing/shrinking the predicted probability of a classifier to reduce live logloss

Let us assume we work on a 2 -class classification problem. In my setting the sample is balanced. To be precise it is a financial markets setting where up and down have approximately 50:50 chance. The classifier produces results $$p_i = P[class =…
Richi W
  • 3,216
  • 3
  • 30
  • 53
5
votes
1 answer

How to handle unseen features in a Naive Bayes classifier?

I am writing a naive bayes classifier for a text classification problem. I have a bunch of words and an associated label: [short,snippet,text], label1 [slightly,different,snippet,text], label2 ... I am able to train the naive bayes fine. However,…
applecider
  • 1,175
  • 2
  • 11
  • 13
4
votes
2 answers

Understanding Add-1/Laplace smoothing with bigrams

I am working through an example of Add-1 smoothing in the context of NLP Say that there is the following corpus (start and end tokens included) + I am sam - + sam I am - + I do not like green eggs and ham - I want to check the probability that the…
3
votes
1 answer

Smoothing a 2-by-2 contingency table

I am trying to implement a system for automatic document categorization, where each document of a corpus belongs to some class. I define the following contingency table for every class C and every word W: $\begin{array}{c|cc} & W &…
3
votes
2 answers

Laplace smoothing understanding implementation

Considering the data set given below Here if we have to classify new data point: D15 (O=Overcast, T=Cool, H=High, W=Strong) Then for P(No|Overcast, Cool, High, Strong) we have, (5/14) * 0 * (1/5) * (4/5) * (3/5) This results to 0 So I read that…
3
votes
1 answer

Is the Laplace/Lidstone smoothing parameter (talking about Multinomial/Bernoulli Naive Bayes) related to the particular structure of the dataset?

I'm working with Multinomial and Bernoulli Naive Bayes implementation of scikit-learn (python) for text classification. I'm using the 20_newsgroups dataset. From the scikit documentation we have: class sklearn.naive_bayes.MultinomialNB(alpha=1.0,…
2
votes
1 answer

In what conditions does naive Bayes classifier perform poorly?

When does naive Bayes perform poorly? Can you think of any specific examples of problems in which it wouldn't work? We can ignore not having seen given data points before as that can be corrected by Laplace smoothing.
Remertion
  • 21
  • 1
  • 2
2
votes
1 answer

What is the proper way to estimate the probability (proportion of time) a rare event occurs?

Often, I need to estimate the probability (proportion of time) a rare event occurs. The standard MLE estimate often gives me extreme estimates since the denominator is usually 1, and the numerator is either 0 or 1, giving me either 100% or 0%. For…
1
2 3