I am using scipy.stats.gaussian_kde
to estimate a pdf for some data. The problem is that the resulting pdf takes values larger than 1. As far as I understand, this should not happen. Am I mistaken? If so why?
Asked
Active
Viewed 8,615 times
9

whuber
- 281,159
- 54
- 637
- 1,101

Björn Pollex
- 1,223
- 2
- 15
- 18
-
(+1 to the possible duplicate) Just to convey this quickly: Probability is defined as an area under a curve. A probability associated with the value of a PDF at a single point is multiplied by 0 (ie. the width of a line) so if anything the probability itself is 0. The linked thread gives excellent further elaboration on this. – usεr11852 May 29 '16 at 20:20
1 Answers
16
You are mistaken. The CDF should not be greater than 1, but the PDF may be. Think, for example, of the PDF of a Gaussian random variable with mean zero and standard deviation $\sigma$: $$f(x) = \frac{1}{\sqrt{2\sigma\pi}}\exp(-\frac{x^2}{2\sigma^2})$$ if you make $\sigma$ very small, then for $x = 0$, the PDF is arbitrarily large!

shabbychef
- 10,388
- 7
- 50
- 93
-
7Another possible source of confusion is that the pdf of a _discrete_ random variable (also called pmf - probability mass function) indeed cannot exceed 1. – Aniko Dec 29 '10 at 20:40
-
@Aniko: This is indeed a source of confusion. I think I understand now. – Björn Pollex Dec 29 '10 at 20:48
-
This question is a duplicate of http://stats.stackexchange.com/q/4220/919 . – whuber Dec 30 '10 at 15:28