When using Q-Q plot, why do we choose the empirical distribution $F_n(x) = \frac {\#\{y \in S \mid y \le x\}} n$, $S$ is sample, for comparison with normal?
Let $S$ be our sample of size $n$. Then we form the empirical distribution $F_n$ as defined above. We then use a Q-Q plot to compare $F_n$ to $N(0,1)$ to see if there might be a linear relationship.
- Why do we choose $F_n$ as the empirical distribution for our sample?
- Could we get other results if we did not choose $F_n$ as the empirical distribution?
- For the fractile of $p \in (0,1)$ we choose the midpoint $x$ of the interval corresponding to $p$. Why do we choose the midpoint?