Is my Correlation reasoning correct?

Question

I am trying to understand how to arrive at $r = \dfrac{Cov(X,Y)}{\sigma_X\sigma_Y}$ with a logical narrative. This in fact is kind of continuation from my this unanswered question.

I see that by standardizing the X and Y, the resultant regression line contains $r$ as the slope. But I have to reason that out why should I do that. This is my current narrative.

My narrative:

Covariance is given by below equation which implicitly states its symmetrical nature.

$$ Cov(X,Y) = \sum_x\sum_y(x-\overline{x})(y - \overline{y})p(x,y) = Cov(Y,X) \tag{1} $$

So X covaries with Y as much as Y with X as per above measure.

However, simple regression lines are not symmetric.

$$ \hat{Y}|x = \hat{\beta_0} + \hat{\beta_1}x \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_1} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(x_i - \overline{x})^2} \ \ , \ \ \hat{\beta_0} = \overline{y} - \hat{\beta_1}\overline{x} \\ \hat{X}|y = \hat{\beta_2} + \hat{\beta_3}y \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_3} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(y_i - \overline{y})^2} \ \ , \ \ \hat{\beta_2} = \overline{x} - \hat{\beta_2}\overline{y} \tag{2} $$

Thus, $\hat{\beta_1} \neq \hat{\beta_2}$.

Given the disadvantage of Covariance being critically dependent on units making it unsuitable to compare different pairs of RVs (or events), we seek a standard measure like Covariance but unitless.
Now by standardizing X and Y, we get new regression lines $\hat{Y}|x, \hat{X}|y$ where the x and y intercepts are zero, and both lines have equal slope which is unitless. That is,

If I do a full standardization on the sample set,

$$ X_s = \dfrac{X - \overline{X}}{s_X} \ \ , \ \ Y_s = \dfrac{Y - \overline{Y}}{s_Y} $$

we get, with new standardized sample set (i.e $x,y$ now represent new sample set)

$$ \hat{Y_s}|x_s = 0 + \hat{\beta_{1s}}x_s \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_{1s}} = \dfrac{\sum_i(y_{is} - \overline{y_s})(x_{is} - \overline{x_s}) }{\sum_i(x_{is} - \overline{x_s})^2} \ \ \ \ \\ \hat{X_s}|y_s = 0 + \hat{\beta_{3s}}y_s \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_{3s}} = \dfrac{\sum_i(y_{is} - \overline{y_s})(x_{is} - \overline{x_s}) }{\sum_i(y_{is} - \overline{y_s})^2} \ \ \ \ \tag{3} $$

results in

$$ r = \hat{\beta_{1s}} = \hat{\beta_{3s}} \tag{4} $$

that is, the regression lines are symmetric to each other.

In fact, reversing this procedure to non standardized raw X and Y, we could say their regression lines have relation with correlation as below

$$ r = \hat{\beta_1}\dfrac{s_X}{s_Y} = \hat{\beta_3}\dfrac{s_Y}{s_X} \tag{5} $$

My questions:
1. Is my above narrative correct and minimally complete? What went wrong? What could be added? How could I improvise?
2. I see, Galton discovered regression via a bivariate normal distribution link . How did we then generalized it to any or random distribution?
3. Also a perfect linearity would mean, underlying distribution is bivariate normal?
4. After this narrative, how could I prove this sample $r$ applies to population $\rho$ also?
5. I hope to see final $r$ equalling cosine product of standardized dot product also. That is,

$$ r = cos\theta = \dfrac{(x - \overline{x})\bullet(y - \overline{y})}{\lvert x - \overline{x} \rvert \lvert y - \overline{y} \rvert} \tag{6} $$

Then, what would unstandardized dot product refer to or how related to non standardized equation set (2)? That is $$ cos\theta = \dfrac{x\bullet y}{\lvert x \rvert \lvert y \rvert} = ? \tag{7} $$

At first, you need to understand the difference between the parameters and the estimate of them, then make the equations meaningful. One example, $E(Y|x) = \hat{\beta_0} + \hat{\beta_1}x$. It does not follow the popular rules. $E(Y|x) = {\beta_0} + {\beta_1}x$ and $(\hat Y|x) = \hat{\beta_0} + \hat{\beta_1}x$ are correct, but they have different meaning. — user158565, Nov 11 '18 at 15:46
oh thanks for pointing that out. I just updated. Is it now ok? — Parthiban Rajendran, Nov 11 '18 at 16:15
Not OK. I just gave you an example, still have problems. I have no answer to your question, because I do not know what you asked for. — user158565, Nov 11 '18 at 16:28
ok can you at least list out all problems so I could update? I am not able to see any or that that hinders understanding my question at least. Is it only with the equations? — Parthiban Rajendran, Nov 11 '18 at 16:31
Give one more example $X = \dfrac{X - \overline{X}}{\sigma_X}$, $\bar X$ is statistics or estimate of the mean, $\sigma_X$ is parameter. ou put them into the same equation, so I do know what it is. X appeared two times, how can that "=" be true? Y — user158565, Nov 11 '18 at 16:34
done that now, I was little liberal there to keep it simple, now added extra notation to denote they the new standardized data set., now all ok? — Parthiban Rajendran, Nov 11 '18 at 16:45
and thank you for sharpening my question with better standard notations. :) — Parthiban Rajendran, Nov 11 '18 at 16:54
I'm not sure if it is helpful to "invent" a narrative. From my (statistical data analysis) point of view, I've never found a real use of the correlation coefficient. It is misleading, at least. The only shortcoming of covariance is the "funny" unit. For the diagonal elements of the covariance matrix, the interpretation is rather simple: just take the square root, and it has the same unit as the sampled variable. So you can get rid of the units in total and restrict the range between [-1,+1] at the same by "simply normalizing" it. As far as I'm concerned this is already the complete narrative. — cherub, Nov 12 '18 at 14:52
You do not define p in your formula for Cov. In all probability you are having a reference to linear correlation which is quite different from usual r denoting sample correlation computed by Karl Pearson formula . — , Nov 12 '18 at 15:06
@SubhashC.Davar can you kindly elaborate? correlation gives a measure of linearity, so what do you mean by linear correlation? Also does the final formula depend on $p$ to be a particular distribution? uniform or bivariate normal etc? — Parthiban Rajendran, Nov 12 '18 at 17:55
@cherub i already am emprically convinced and also see how SDs cancel out the units, but that is just obvious that units cancel out. But why SD? Why not any other value $K$ of same unit? There is a reason why SD sits in the denominator other than just for cancelling the unit. Those who derived simply did not use "oh, we have SD of same unit so we could just divide to cancel out units". There is something more as a reason which I want to know. Else it means I just know formula, I did not "understand" it — Parthiban Rajendran, Nov 12 '18 at 17:59
No, it's actually not there to cancel the unit; that part is accidental. It serves as normalization. Think about what covariance actually means; I mean in terms of statistical "relation" (it's hard to find a term which does not have a special meaning). You will find, that it's a qualitative indicator. To create a quantitative indicator, you need to rescale -- hence the denominator. — cherub, Nov 12 '18 at 18:04
I will put other way, [this](https://stats.stackexchange.com/questions/18058/how-would-you-explain-covariance-to-someone-who-understands-only-the-mean) was my motivation to visualize covariance instead of just accepting it as formula of variance between two RVs. The total area (if joint uniform pdf assumed) of rectangles, give a qualitative measure. The _net_ or _expected_ area turns out to be covariance. Now for variance, how do we visualize is as there is no rectangles? Because of this gap in visualizing variance in same lines as covariance, I fail to accept it to use for normalization. — Parthiban Rajendran, Nov 12 '18 at 18:09
Well, yes and no. I don't think that this kind of visualization is actually helpful. So, yes it's an area by its definition. But that does not carry any real information -- the size of the area is totally arbitrary. The most useful interpretation imo is something like: "for values larger/smaller than the average in x, it is more likely to find a value of y that is larger/smaller than the average in y". It doesn't tell you how much more likely, unless you add a lot more information. But the covariance encodes all information about the interdependency of the pair of variables. — cherub, Nov 12 '18 at 20:38
@PaariVendhan Linear correlation is a measure of validity generalization of different tests.Simple association between two variables can be established using chi square or Karl Pearson measure etc. To establish causal association you may go for regression etc. You can read basic statistics to understand better. — , Nov 12 '18 at 23:50
@SubhashC.Davar Ironically I started and in basic statistics only for past few months due to these gaps in understanding :( I am for now would be content to only understand fully Karl Pearson's $\rho = \dfrac{Cov(X,Y)}{\sigma_X \sigma_Y }$ for population and $r = \dfrac{\sum_i \sum_j (x_i - \overline{x})(y_i - \overline{y}) }{\sqrt{ \sum_i (x_i - \overline{x})^2 \sum_j (y_i - \overline{y})^2 }}$ via the concepts of regression and covariance. I am empirically convinced about their values $\pm 1$, but want to understand how they were constructed in first place. — Parthiban Rajendran, Nov 13 '18 at 04:45
@cherub Imho, area does carry the qualitative measure.. [Here](https://www.scribd.com/document/392834666/30-Correlation-DRAFT) is my draft where I have finished Covariance and into Correlation. I have explained in detail step by step, how the area of rectangles quantifies the measure and later volume. Of course, inspired by the [link](https://stats.stackexchange.com/questions/18058/how-would-you-explain-covariance-to-someone-who-understands-only-the-mean) I shared earlier. Please let me know if any flaw in that caz I am basing my further studies on these understandings. — Parthiban Rajendran, Nov 13 '18 at 04:51
@cherub your ```more likely``` info is coming from the pmf we multiply with. of course that makes it volume, but since initially we assume uniform joint distribution, its just scaled effect, so area having info. once we have non uniform joint pmf, volume comes in picture ( I have illustrated that also in an example 1.8 at end of Covariance chapter) — Parthiban Rajendran, Nov 13 '18 at 04:54
@SubhashC.Davar to my surprise i could not find any online material book or blog or article or paper that "derives" the correlation formula. Almost all calls it by definition and only proves by math or empirically their range $\pm 1$. I had to get in to Galton's experiments to find the root, but that is becoming too complicated due to bivariate joint distribution coming in picture another big topic I am not familiar yet. — Parthiban Rajendran, Nov 13 '18 at 04:59
You may consult Mathai and Rathie book- probability and Statistics Mc Millan publisher before you raise any further questions to me. Be serious. — , Nov 13 '18 at 11:09
@SubhashC.Davar With all due respect sir, I was and am fully serious about my question and was just honestly sharing my situation only to give context of what I was up to. — Parthiban Rajendran, Nov 13 '18 at 12:11
@SubhashC.Davar and [the book](https://amzn.to/2OD3GoM) costs ₹8000 , cannot afford that much for getting clarified on a single doubt. Kindly check [here](https://bit.ly/2zg5jnF), where I have written about 30 pages in detail just to understand a single covariance formula and now into correlation, I have been spending last one week in understanding and trying to connect the dots on a single formula, so I wonder why I was sounding any less than serious in anyway. And I have been using books [1](https://bit.ly/2Fh348H), [2](https://bit.ly/2RSu14d) which wern't helpful in this doubt. — Parthiban Rajendran, Nov 13 '18 at 12:34
r = Cov(X,Y)/ σX * σY do you want to understand this or population correlation — , Nov 14 '18 at 01:24
I want to understand both, the derivation of both sample correlation and population correlation. — Parthiban Rajendran, Nov 14 '18 at 04:56
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/85727/discussion-between-subhash-c-davar-and-paari-vendhan). — , Nov 14 '18 at 09:43

Is my Correlation reasoning correct?

0 Answers0

Linked