I am trying to understand how to arrive at $r = \dfrac{Cov(X,Y)}{\sigma_X\sigma_Y}$ with a logical narrative. This in fact is kind of continuation from my this unanswered question.
I see that by standardizing the X and Y, the resultant regression line contains $r$ as the slope. But I have to reason that out why should I do that. This is my current narrative.
My narrative:
- Covariance is given by below equation which implicitly states its symmetrical nature.
$$ Cov(X,Y) = \sum_x\sum_y(x-\overline{x})(y - \overline{y})p(x,y) = Cov(Y,X) \tag{1} $$
So X covaries with Y as much as Y with X as per above measure.
- However, simple regression lines are not symmetric.
$$ \hat{Y}|x = \hat{\beta_0} + \hat{\beta_1}x \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_1} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(x_i - \overline{x})^2} \ \ , \ \ \hat{\beta_0} = \overline{y} - \hat{\beta_1}\overline{x} \\ \hat{X}|y = \hat{\beta_2} + \hat{\beta_3}y \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_3} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(y_i - \overline{y})^2} \ \ , \ \ \hat{\beta_2} = \overline{x} - \hat{\beta_2}\overline{y} \tag{2} $$
Thus, $\hat{\beta_1} \neq \hat{\beta_2}$.
Given the disadvantage of Covariance being critically dependent on units making it unsuitable to compare different pairs of RVs (or events), we seek a standard measure like Covariance but unitless.
Now by standardizing X and Y, we get new regression lines $\hat{Y}|x, \hat{X}|y$ where the x and y intercepts are zero, and both lines have equal slope which is unitless. That is,
If I do a full standardization on the sample set,
$$ X_s = \dfrac{X - \overline{X}}{s_X} \ \ , \ \ Y_s = \dfrac{Y - \overline{Y}}{s_Y} $$
we get, with new standardized sample set (i.e $x,y$ now represent new sample set)
$$ \hat{Y_s}|x_s = 0 + \hat{\beta_{1s}}x_s \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_{1s}} = \dfrac{\sum_i(y_{is} - \overline{y_s})(x_{is} - \overline{x_s}) }{\sum_i(x_{is} - \overline{x_s})^2} \ \ \ \ \\ \hat{X_s}|y_s = 0 + \hat{\beta_{3s}}y_s \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_{3s}} = \dfrac{\sum_i(y_{is} - \overline{y_s})(x_{is} - \overline{x_s}) }{\sum_i(y_{is} - \overline{y_s})^2} \ \ \ \ \tag{3} $$
results in
$$ r = \hat{\beta_{1s}} = \hat{\beta_{3s}} \tag{4} $$
that is, the regression lines are symmetric to each other.
- In fact, reversing this procedure to non standardized raw X and Y, we could say their regression lines have relation with correlation as below
$$ r = \hat{\beta_1}\dfrac{s_X}{s_Y} = \hat{\beta_3}\dfrac{s_Y}{s_X} \tag{5} $$
My questions:
1. Is my above narrative correct and minimally complete? What went wrong? What could be added? How could I improvise?
2. I see, Galton discovered regression via a bivariate normal distribution link . How did we then generalized it to any or random distribution?
3. Also a perfect linearity would mean, underlying distribution is bivariate normal?
4. After this narrative, how could I prove this sample $r$ applies to population $\rho$ also?
5. I hope to see final $r$ equalling cosine product of standardized dot product also. That is,
$$ r = cos\theta = \dfrac{(x - \overline{x})\bullet(y - \overline{y})}{\lvert x - \overline{x} \rvert \lvert y - \overline{y} \rvert} \tag{6} $$
Then, what would unstandardized dot product refer to or how related to non standardized equation set (2)? That is $$ cos\theta = \dfrac{x\bullet y}{\lvert x \rvert \lvert y \rvert} = ? \tag{7} $$