1

I am unsure of the definition and purpose of a PDF.

I have heard it being described as essentially a smoothing of a histogram. I've heard it stated that its main advantage over a histogram is that bin sizes are not a factor anymore in how the distribution looks like. Are these things essentially true?

In terms of purpose, does it have any other purpose than what the histogram is for?

Finally, I'm confused about the pdf relation to Gaussian/Normal distribution.

The pdf looks kind of like a bell curve and this is kind of confusing. Does a pdf always look like a bell curve?

edward84
  • 21
  • 1

1 Answers1

3

You are confusing several different concepts.

  • Probability density function (pdf) is a kind of mathematical function that tells us what is the "probability per foot" for a continuous random variable. Probability density function $f$ has such properties that $f(x) \ge 0$ for all $x$ and $\int\, f(x) \,dx = 1$. We also can use it to calculate probabilities over intervals, $\Pr(a \le x \le b) = \int_a^b \, f(x)\, dx$.
  • Probability density functions can have all different shapes, the "bell curve", i.e. Gaussian, known also as normal distribution is just one of the possibilities. To give one counterexample, the uniform random variable has a probability density function that has a shape of a rectangle, there's nothing "bell-curved" about it.
  • Histogram is an estimator, it approximates probability density function based on some data.
  • What you seem to be describing as density that is a "smoother histogram" is another estimator: kernel density estimator. While histogram learns a binned distribution, kernel density estimator uses a smooth function to approximate the probability density function estimating it from the data. Kernel density estimator is defined in terms of kernels, where one of the popular kernels is a Gaussian function.
Tim
  • 108,699
  • 20
  • 212
  • 390
  • That does clarify things a bit. But, a. when you perform a pdf on a dataset..what is it doing compared to using KDE? You say KDE uses a smooth function to approximate PDF based on the data. What is happening with just a PDF? – edward84 Aug 04 '21 at 17:36
  • @edward84 KDE is estimated from the data. PDF is just a mathematical function that doesn’t have to have anything in common with any data. If you take uniform PDF there’s no way to force it to look like a bell curve. For KDE it will take shape of the distribution of your data almost no matter what the shape is. – Tim Aug 04 '21 at 17:52
  • That makes sense. So, what are some situations where you would be interested in using a PDF? would it be to test if a sample is in line with some theoretical dist (like normal)? – edward84 Aug 04 '21 at 17:54
  • @edward84 there’s no short answer because there are many uses. You can check any statistics handbook for many examples of using PDFs. – Tim Aug 04 '21 at 18:01
  • A histogram is, first and foremost, a *descriptor,* not an estimator. What it importantly has in common with a PDF is that both represent probability with *area* rather than *height* of the graph. – whuber Aug 04 '21 at 18:36
  • @edward84 "what are some situations where you would be interested in using a PDF?" A PDF is a theoretical idealization. Estimating it from empirical data is just one possibility. In other situations it can be computed from symmetry considerations (Fermi-Dirac distribution), or as the solution of an integro-differential equation (Boltzmann equation for the particle phase-space PDF), or computed from the solution of a differential-equation (absolute square of the solution of the Schrödinger equation yields the space PDF, its Fourier transform the momentum PDF). – cdalitz Aug 04 '21 at 20:09
  • So is a PDF of a normal distribution than simply the particular formula for a normal distribution? – edward84 Aug 05 '21 at 16:35
  • @edward84 normal distribution is defined in terms of this pdf. – Tim Aug 05 '21 at 19:24