0

When using Q-Q plot, why do we choose the empirical distribution $F_n(x) = \frac {\#\{y \in S \mid y \le x\}} n$, $S$ is sample, for comparison with normal?

Let $S$ be our sample of size $n$. Then we form the empirical distribution $F_n$ as defined above. We then use a Q-Q plot to compare $F_n$ to $N(0,1)$ to see if there might be a linear relationship.

  1. Why do we choose $F_n$ as the empirical distribution for our sample?
  2. Could we get other results if we did not choose $F_n$ as the empirical distribution?
  3. For the fractile of $p \in (0,1)$ we choose the midpoint $x$ of the interval corresponding to $p$. Why do we choose the midpoint?
Sycorax
  • 76,417
  • 20
  • 189
  • 313
Shuzheng
  • 109
  • 1
  • What would you propose using to represent the distribution of the data instead of $F_n$? – whuber Jan 23 '14 at 20:29
  • I don't know. But what justify the choice ? Only that we have no other choice ? – Shuzheng Jan 23 '14 at 20:30
  • 3
    As it stands, your question sounds like "why do we use our data in order to analyze our data?" Do you see why that is confusing? – whuber Jan 23 '14 at 20:31
  • See my answer on math stack exchange. –  Jan 23 '14 at 20:34
  • The Q's stand for "quantile." We would expect the quantiles of normally-distributed data to approximately match those of the reference distribution (albeit with some error due to finite sample size). The indicator function give us the empirical quantiles for each element of the sample. – Sycorax Jan 23 '14 at 20:47
  • Is the emperical quantiles always determined like this ? – Shuzheng Jan 23 '14 at 21:00
  • There are many methods for computing empirical quantiles; I will redirect this thread to one in which that question is answered (by Jeromy Anglim). – whuber Jan 24 '14 at 16:45

0 Answers0