Laplace smoothing (also known as additive smoothing) is a technique associated with a probability regularisation task. It ensures that certain improbable outcomes are still associated with a minimum probability of occurrence.
Questions tagged [laplace-smoothing]
43 questions
35
votes
8 answers
In Naive Bayes, why bother with Laplace smoothing when we have unknown words in the test set?
I was reading over Naive Bayes Classification today. I read, under the heading of Parameter Estimation with add 1 smoothing:
Let $c$ refer to a class (such as Positive or Negative), and let $w$ refer to a token or word.
The maximum likelihood…

tumultous_rooster
- 1,145
- 4
- 14
- 24
12
votes
2 answers
Laplace smoothing and Dirichlet prior
On the wikipedia article of Laplace smoothing (or additive smoothing), it is said that from a Bayesian point of view,
this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter…

DanielX2010
- 173
- 1
- 7
11
votes
3 answers
Terminology for Bayesian Posterior Mean of Probability with Uniform Prior
If $p \sim$ Uniform$(0,1)$, and $X \sim$ Bin$(n, p)$, then the posterior mean of $p$ is given by $\frac{X+1}{n+2}$.
Is there a common name for this estimator? I've found it solves lots of people's problems and I'd like to be able to point people to…

Cliff AB
- 17,741
- 1
- 39
- 84
10
votes
2 answers
Calculating Emission Probability values for Hidden Markov Model (HMM)
I'm new to HMM and still learning. I'm currently using HMM to tag part-of-speech. To implement the viterbi algorithm I need transition probabilities ($ a_{i,j} \newcommand{\Count}{\text{Count}}$) and emission probabilities ($ b_i(o) $).
I'm…

Ramesh-X
- 341
- 4
- 14
7
votes
2 answers
What's a good approach to estimate the probability of word frequencies?
I have a document corpus and I want to estimate the probability of occurrence of a certain word $w$. Simply calculating the frequencies and use such a number as an estimation is not a good choice. Is there any work on this topic describing a better…

derekhh
- 185
- 1
- 5
6
votes
2 answers
What if a numerator term is zero in Naive Bayes?
I'm trying to predict the probability that a user will visit a particular website based on several factors (day of the week, time since last visit, etc). My question is what to do if one of the numerator terms goes to zero?
For instance, suppose I…

Jeff
- 3,525
- 5
- 27
- 38
5
votes
0 answers
Laplace smoothing parameter choice for Markov chain transitions
Let $Y_{t}$ be the state of the process at time $t$, ${\bf P}$ be the transition matrix then:
$$ {\bf P}_{ij} = P(Y_{t} = j | Y_{t-1} = i) $$
Since this is a Markov chain, this probability depends only on $Y_{t-1}$, so it can be estimated by the…

HCAI
- 737
- 2
- 7
- 23
5
votes
1 answer
Smoothing/shrinking the predicted probability of a classifier to reduce live logloss
Let us assume we work on a 2 -class classification problem. In my setting the sample is balanced. To be precise it is a financial markets setting where up and down have approximately 50:50 chance.
The classifier produces results $$p_i = P[class =…

Richi W
- 3,216
- 3
- 30
- 53
5
votes
1 answer
How to handle unseen features in a Naive Bayes classifier?
I am writing a naive bayes classifier for a text classification problem. I have a bunch of words and an associated label:
[short,snippet,text], label1
[slightly,different,snippet,text], label2
...
I am able to train the naive bayes fine. However,…

applecider
- 1,175
- 2
- 11
- 13
4
votes
2 answers
Understanding Add-1/Laplace smoothing with bigrams
I am working through an example of Add-1 smoothing in the context of NLP
Say that there is the following corpus (start and end tokens included)
+ I am sam -
+ sam I am -
+ I do not like green eggs and ham -
I want to check the probability that the…

basil
- 163
- 2
- 5
3
votes
1 answer
Smoothing a 2-by-2 contingency table
I am trying to implement a system for automatic document categorization, where each document of a corpus belongs to some class. I define the following contingency table for every class C and every word W:
$\begin{array}{c|cc}
& W &…

DevelBD
- 31
- 2
3
votes
2 answers
Laplace smoothing understanding implementation
Considering the data set given below
Here if we have to classify new data point:
D15 (O=Overcast, T=Cool, H=High, W=Strong)
Then for P(No|Overcast, Cool, High, Strong)
we have,
(5/14) * 0 * (1/5) * (4/5) * (3/5)
This results to 0
So I read that…

Cybercop
- 151
- 1
- 4
3
votes
1 answer
Is the Laplace/Lidstone smoothing parameter (talking about Multinomial/Bernoulli Naive Bayes) related to the particular structure of the dataset?
I'm working with Multinomial and Bernoulli Naive Bayes implementation of scikit-learn (python) for text classification. I'm using the 20_newsgroups dataset.
From the scikit documentation we have:
class sklearn.naive_bayes.MultinomialNB(alpha=1.0,…

Trevor
- 31
- 1
- 4
2
votes
1 answer
In what conditions does naive Bayes classifier perform poorly?
When does naive Bayes perform poorly? Can you think of any specific examples of problems in which it wouldn't work? We can ignore not having seen given data points before as that can be corrected by Laplace smoothing.

Remertion
- 21
- 1
- 2
2
votes
1 answer
What is the proper way to estimate the probability (proportion of time) a rare event occurs?
Often, I need to estimate the probability (proportion of time) a rare event occurs. The standard MLE estimate often gives me extreme estimates since the denominator is usually 1, and the numerator is either 0 or 1, giving me either 100% or 0%.
For…

learner
- 21
- 1