0

I am reading a textbook on statists by Freedman, Pisani, and Purves. In one of the chapters about correlation between two variables, it is given that the vertical distance of a typical point from the standard deviation line (say $s_v$) on a scatter plot is, $s_v=\sqrt{2(1-|r|)} \times \sigma_v$, where $r$ is the correlation coefficient, and $\sigma_v$ is the vertical standard deviation.

How to work out this formula mathematically? I thought about working it out from the slope of the SD line, or formula for $r$ but I can't seem to work it out or find a hint/solution online. It's not a homework problem it's a sort of technical footnote.

It is also mentioned that there are similar formulas for the horizontal direction. Is $s_h=\sqrt{2(1-r)} \times \sigma_h$, the formula for the horizontal distance? If not, what will it be?

  • One such demonstration, using only elementary (high school) geometry, is at https://stats.stackexchange.com/a/71303/919. – whuber Apr 05 '21 at 14:54

1 Answers1

0

It is kind of confusing. The vertical distance of a typical point in fact refers to r.m.s(vertical distance of all data). Typical here means average in some sense.

Give $(x_i,y_i)_{i=1}^n$, let $d_i$ be the vertical distance from SD line to $(x_i,y_i)$. Then we have $$d_i = y_i - \left(\frac{\sigma_y}{\sigma_x}(x_i-\bar{x})+\bar{y}\right).$$

Thus r.m.s(vertical distance of all data) is given by \begin{align} \sqrt{\frac{1}{n}\sum d_i^2}=SD[d_i]&=SD[y_i - \left(\frac{\sigma_y}{\sigma_x}(x_i-\bar{x})+\bar{y}\right)]\\ &=SD[y_i - \frac{\sigma_y}{\sigma_x}x_i + const]. \end{align} SD is invariant with constant shift, applying $$SD(X+Y)=\sqrt{Var(X)+Var(Y)+2COV(X,Y)}$$ onto the above: \begin{align} SD[y_i - \frac{\sigma_y}{\sigma_x}x_i]&=\sqrt{\sigma_y^2+\frac{\sigma_y^2}{\sigma_x^2}\sigma_x^2 -2\frac{\sigma_y}{\sigma_x}COV(x,y)}\\ &=\sqrt{2\sigma_y^2 -2\frac{\sigma_y}{\sigma_x}r\sigma_y\sigma_x}\\ &=\sqrt{2(1-r)}\sigma_y \end{align}