9

I am using scipy.stats.gaussian_kde to estimate a pdf for some data. The problem is that the resulting pdf takes values larger than 1. As far as I understand, this should not happen. Am I mistaken? If so why?

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Björn Pollex
  • 1,223
  • 2
  • 15
  • 18
  • (+1 to the possible duplicate) Just to convey this quickly: Probability is defined as an area under a curve. A probability associated with the value of a PDF at a single point is multiplied by 0 (ie. the width of a line) so if anything the probability itself is 0. The linked thread gives excellent further elaboration on this. – usεr11852 May 29 '16 at 20:20

1 Answers1

16

You are mistaken. The CDF should not be greater than 1, but the PDF may be. Think, for example, of the PDF of a Gaussian random variable with mean zero and standard deviation $\sigma$: $$f(x) = \frac{1}{\sqrt{2\sigma\pi}}\exp(-\frac{x^2}{2\sigma^2})$$ if you make $\sigma$ very small, then for $x = 0$, the PDF is arbitrarily large!

shabbychef
  • 10,388
  • 7
  • 50
  • 93
  • 7
    Another possible source of confusion is that the pdf of a _discrete_ random variable (also called pmf - probability mass function) indeed cannot exceed 1. – Aniko Dec 29 '10 at 20:40
  • @Aniko: This is indeed a source of confusion. I think I understand now. – Björn Pollex Dec 29 '10 at 20:48
  • This question is a duplicate of http://stats.stackexchange.com/q/4220/919 . – whuber Dec 30 '10 at 15:28