Questions tagged [empirical-cumulative-distr-fn]

Empirical cumulative distribution function: a step function increasing by $1/n$ at each unique $X$-value that occurred in the sample.

Consider a numeric or at least ordinal random variable $X$ and a random sample of size $n$ from the distribution of $X$, $x_{1}, x_{2}, \dots, x_{n}$. The ECDF $F_{n}(x)$ is a step function increasing by $\frac{1}{n}$ at each unique $X$-value that occurred in the data, when there are no ties. When $k$ values are tied at one value of $x$ the increment is $\frac{k}{n}$. The formal definition is $F_{n}(x) = \frac{1}{n}\sum_{i=1}^{n}I(x_{i} \leq x)$ where $I()$ is the indicator function. (For further explanation, see Integrating an empirical CDF. For a modified estimator of the CDF, visit PIT on a sample with m bins, and KS test used to estimate a good value for m.)

As construction of the ECDF requires no binning, the ECDF is unique and is often a good replacement for a histogram.

132 questions
37
votes
4 answers

Intuitive explanation of Kolmogorov Smirnov Test

What is the cleanest, easiest way to explain someone the concept of Kolmogorov Smirnov Test? What does it intuitively mean? It's a concept that I have difficulty in articulating - especially when explaining to someone. Can someone please explain it…
28
votes
7 answers

Distribution hypothesis testing - what is the point of doing it if you can't "accept" your null hypothesis?

Various hypothesis tests, such as the $\chi^{2}$ GOF test, Kolmogorov-Smirnov, Anderson-Darling, etc., follow this basic format: $H_0$: The data follow the given distribution. $H_1$: The data do not follow the given distribution. Typically, one…
24
votes
5 answers

Empirical CDF vs CDF

I'm learning about the Empirical Cumulative Distribution Function. But I still don't understand Why is it called 'Empirical'? Is there any difference between Empirical CDF and CDF?
13
votes
2 answers

Empirical distribution alternative

BOUNTY: The full bounty will be awarded to someone who provides a reference to any published paper which uses or mentions the estimator $\tilde{F}$ below. Motivation: This section is probably not important to you and I suspect it won't help you get…
13
votes
1 answer

Why does ecdf uses a step function and not a linear interpolation?

Empirical CDF functions are usually estimated by a step function. Is there a reason why this is done in such a way and not by using a linear interpolation? Does the step function has any interesting theoretical properties which make us prefer…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
13
votes
1 answer

Integrating an empirical CDF

I have an empirical distribution $G(x)$. I calculate it as follows x <- seq(0, 1000, 0.1) g <- ecdf(var1) G <- g(x) I denote $h(x) = dG/dx$, i.e., $h$ is the pdf while $G$ is the cdf. I now want to solve an equation for the upper limit…
user46768
  • 267
  • 4
  • 8
11
votes
2 answers

Algorithms for computing multivariate Empirical distribution function (ECDF)?

One dimensional ECDF is fairly easy to compute. When it comes to two dimensions and up, however, online resources become sparse and hard to reach. Can anyone suggest, define and/or present efficient algorithms (not ready made implementation) for…
11
votes
1 answer

Why can't one generalize the Kolmogorov-Smirnov test to 2 or more dimensions?

The question says it all. I've read both that one can't generalize KS to a dimension equal or larger than two, and that famous implementations like that in Numerical Recipes are simply wrong. Could you please explain why is so?
9
votes
2 answers

Comparing two ECDFs using Kolmogorov-Smirnov test (alternative hypothesis)

I am taking measurements of a computer system performance over time and I'd like to understand if the performance is degrading or improving as time passes.. After doing some research, I picked the KS test for this comparison, and I'd like to confirm…
8
votes
3 answers

Confidence Interval of CDF

I am trying to determine if there is a statistically meaningful distinction between the cumulative probability density curves shown in the figure below. It's simple enough to do a $t$-test on the means of these distributions. But I am also…
gregmacfarlane
  • 3,242
  • 21
  • 34
8
votes
1 answer

What's the proper y-axis label for an empirical cumulative distribution plot in a publication?

Examples online typically write "F(x)", but that seems confusing to readers.
7
votes
2 answers

What inferential method produces the empirical CDF?

The empirical cdf is an estimate of the cdf. What kind of estimation method (such as method of moments, MLE, ...) constructs the empirical cdf? Is the empirical cdf a nonparametric estimate? Do nonparametric estimates have construction methods…
Tim
  • 1
  • 29
  • 102
  • 189
7
votes
1 answer

What are the simplest examples of nonlinear statistical functionals?

I am reading Wasserman's book "All of Statistics" in which he defines a statistical functional as any function $T(F)$ of the cumulative distribution function $F(x)$ that outputs a real number. Then he goes on to define a 'linear statistical…
7
votes
4 answers

What is the proper way to estimate the CDF for a distribution from samples taken from that distribution?

Given $n$ samples from a (continuous) distribution X, the obvious thing to do is sort them, and distribute them equally across $[0,1]$ by taking $(x_{(k)}, (k-1/2)/n)$ as estimates of particular points on the CDF, and doing some sort of…
7
votes
0 answers

Empirical distribution function of overlapping time series data

If we model asset return volatility for periods of more than one (say more than one day) there is the square-root rule which holds true under some assumptions. On the other hand practitioners sometimes use rolling, overlapping data. Treating them as…
1
2 3
8 9