Why is the Empirical Distribution based on the Cumulative Distribution?

Question

Why is the Empirical Probability Distribution Function based on the Cumulative Probability Distribution Function?

I have always seen the Empirical Distribution Function to have a "staircase" style shape, similar to a cumulative probability distribution function:

Is there any reason that the Empirical Distribution Function always appears in the shape of a "staircase" and estimates the Cumulative Probability Distribution Function instead of the Non-Cumulative Probability Distribution Function? Or is this a trivial matter - can the Empirical Probability Distribution be easily transformed into a Non-Cumulative Probability Distribution Function?

Can someone please explain this?

Thanks!

Note: Apparently F-hat (t) is an unbiased estimator of the true Cumulative Probability Distribution function F(t).

References:

https://en.wikipedia.org/wiki/Empirical_distribution_function

As opposed to what, a density function or histogram? In some sense, a “distribution” *is* a CDF. — Dave, Nov 08 '21 at 04:33
@ Dave: Thank you for your reply! I am just confused - why doesn't the empirical distribution look like a "bell curve", and instead look like a "staircase"? Thank you! — stats_noob, Nov 08 '21 at 04:37
It’s just a matter of terminology. The bell curve is a density, not a distribution. (That sounds a little ridiculous, yes, but it’s technically correct.) If you plot the “distribution” of a Gaussian random variable, it will look like a smoothed staircase (perhaps like a slide or a skateboarding ramp). — Dave, Nov 08 '21 at 04:47
@Dave "The bell curve is a density, not a distribution." Part of me is all "eh." And another part of me wants to understand if there is something deep to appreciate about the probability density function **not** representing a distribution, but the integral of that PDF representing a distribution… surely they are just two "images" of one distribution… just as the histogram and eCDF plot both give insights? — Alexis, Nov 08 '21 at 04:55
There is a distinction: The density of the sum of two or more independent variables is the convolution of their densities. Thus the central limit theorem can be interpreted as a statement about the properties of density functions under convolution: the convolution of a number of density functions tends to the normal density as the number of density functions increases without bound. These theorems require stronger hypotheses than the forms of the central limit theorem that use (cumulative) distribution functions. Theorems of this type are often called **local limit theorems**. — rubikscube09, Nov 08 '21 at 14:45
To better understand the subtleties, I recommend Prof. Terence Tao's post: https://terrytao.wordpress.com/tag/local-limit-theorems/ The main divergences occur when the variables are discrete (as is expected) — rubikscube09, Nov 08 '21 at 14:46
For continuous functions you get your "bell curve" (or whatever distribution have you, CDFs/PDFs are not exclusive to normal distributions!) by simply differentiating the CDF. It seems everyone's beating around the bush with bit more advanced concepts, so let it be out in the clear (Dave gave you a correct answer but given your initial confusion I can't help but to feel it should be stated even more explicitly). HOWEVER, this is not as simple for **discrete** distributions, as pointed out by multiple people: best you can do is approximate it first, and this step is anything but trivial. — Lodinn, Nov 09 '21 at 16:26

Xi'an · Answer 1 · 2021-11-08T07:44:14.773

The empirical distribution function $\hat{F}(\cdot)$

is a step function by construction, since it puts a probability (Dirac) mass of $1/n$ on every term in the sample, $(x_1,\ldots,x_n)$, hence jumps by the same factor $1/n$ from one observation to the next. As a result, it is not everywhere differentiable and cannot be associated with a probability density function, meaning there is no density equivalent to the empirical distribution function (at least wrt a continuous measure like the Lebesgue measure).
is "empirical" in the sense that it is based on an iid sample, $(x_1,\ldots,x_n)$, as opposed to the true cumulative distribution function, $F(\cdot)$, of the sample
is a proper cumulative distribution function (cdf), corresponding to an average of Dirac distributions over the sample, $(x_1,\ldots,x_n)$
estimates the true cdf $F$ with the property of being unbiased that$$\mathbb{E}^F[\hat{F}(x)]=F(x)$$for all $x$'s, and
is converging a.s. to the true cdf $F$ for the uniform convergence norm. This is the Glivenko-Cantelli Theorem.

What is a non-cumulative distribution function?

Sextus Empiricus · Answer 2 · 2021-11-10T21:39:18.500

The distribution function is a 'function to describe the distribution'.

But several functions can be used to describe a distribution, so the 'distribution function' may refer to different things. See for instance: Are the terms probability density function and probability distribution (or just "distribution") interchangeable?

Mostly used is the cumulative distribution function (CDF) because it uniquely defines a distribution, and it may be considered the density function. (I believe that the characteristic function and the cumulant generating function are also sometimes referred to with the term 'the distribution function').

According to this list of earliest uses of statistical terms the term 'distribution function' first occurred in 1919 in the German language literature (R. von Mises' "Grundlagen der Wahrscheinlichkeitsrechnung") and in 1935 in English language literature ( J. L. Doob's "The Limiting Distributions of Certain Statistics"). There is some works in English from 1933 by Aurel Witner, for instance "On the Stable Distribution Laws".

In those works by von Mises, Doob and Witner, the 'distribution function' is defined as what we know more commonly now as the cumulative distribution function. But there are around that time other uses of 'distribution function'. For instance in Nordic literature (more precise the Scandinavian Actuarial Journal) the term 'verteilungfunktion' occurs in 1919 as the probability density or frequency distribution, see Hongström and Hagström . We also see Wishart and Bartlett use the term 'distribution function' in 1933 referring to the probability density function "The generalised product moment distribution in a normal system" and Wilks in 1932.

Empirical distribution

So the 'empirical distribution' refers to an empirical estimate of the cumulative distribution function.

Below you see an example with a sample from a standard normal distribution.

Empirical frequency distribution

If the observations are discrete then instead of the probabilities $P(X \leq x)$ we could also describe the probabilities $P(X = x)$. This is also called the probability mass function(PMF).

Below is an example with the data from 'illustration I' in Pearson's article on the chi-squared statistic to test the goodness of fit for frequency curves.

The following data are due to Professor W.F.R. Weldon, F.R.S., and give the observed frequency of dice with 5 or 6 points when a cast of twelve dice was made 26 306 times:

Empirical density distribution

why doesn't the empirical distribution look like a "bell curve"

The bell curve is a probability density function (PDF). It is a density of the probability mass. The density function does not express probabilities, like the above $P(X \leq x)$ and $P(X = x)$. So we can not estimate the density function empirically by observing frequencies in a sample.

However, what sometimes is done is bin the data and create a histogram like the PMF case above. Other ways are estimating the density by some smoothening of the observed data.

Below is an example of estimating the normal distribution PDF with a kernel smoother. The sampled points are illustrated in the image as points at the top.

M. FRÉCHET AND J. SHOHAT 1931 "A proof of the generalized second-limit theorem in the theory of probability" call the CDF 'law of probability' — Sextus Empiricus, Nov 08 '21 at 14:48
Great links! What is the meaning of the curve linking the points on the pmf of the Binomial? — Xi'an, Nov 09 '21 at 10:17
@Xi'an that curve might indeed be confusing. The points are the theoretical PMF. I added the curve to make the points more salient, but the function is not truly continuous. Should I delete it? Maybe we have a question about this "how to plot a pmf on top of a histogram?" or should create it? — Sextus Empiricus, Nov 09 '21 at 11:25
I think this is confusing. which is why I made this tongue-in-cheek comment, as it gives the impression of a (continuous) density co-existing with the discrete nature of the distribution. An histogram is actually similarly dubious as a bar plot for the observed values would seem more appropriate. — Xi'an, Nov 09 '21 at 19:46
@Xi'an I agree, the continuous line has disadvantages and I will change it (I do somehow like to use lines in place of a pmf, because it makes the function more clear, but that is a personal preference). — Sextus Empiricus, Nov 09 '21 at 20:00
@Xi'an I am not sure what you mean by the bar plot in place of the histogram. The histogram here has bin sizes of 1 and is effectively a bar plot. — Sextus Empiricus, Nov 09 '21 at 20:01
My point is that the "bin sizes of 1" are also giving an impression of continuity. — Xi'an, Nov 09 '21 at 20:48
Even with shorter intervals, the histogram as a tool is not suited to an integer value rv imho. — Xi'an, Nov 10 '21 at 21:01
@Xi'an But how is a barplot different from a histogram when I take the bars of a histogram and add some spacing in between them such that they become visible as bars? — Sextus Empiricus, Nov 10 '21 at 21:27
Granted. To quote from [Wikipedia](https://en.wikipedia.org/wiki/Histogram), *"Histograms are sometimes confused with bar charts. A histogram is used for continuous data, where the bins represent ranges of data, while a bar chart is a plot of categorical variables. Some authors recommend that bar charts have gaps between the rectangles to clarify the distinction."* — Xi'an, Nov 11 '21 at 08:46

Why is the Empirical Distribution based on the Cumulative Distribution?

2 Answers2

Empirical distribution

Empirical frequency distribution

Empirical density distribution