I just finished learning MLE, Regression, Covariance and now in to Correlation.I want to transform logically from Regression to Correlation using Covariance.
Regression:
A simple regression model tells me that, given RVs sample set of pairs of X and Y,
$$ E(Y|x) = \hat{\beta_0} + \hat{\beta_1}x \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_1} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(x_i - \overline{x})^2} \ \ , \ \ \hat{\beta_0} = \overline{y} - \hat{\beta_1}\overline{x} $$
$$ E(X|y) = \hat{\beta_2} + \hat{\beta_3}y \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_3} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(y_i - \overline{y})^2} \ \ , \ \ \hat{\beta_2} = \overline{x} - \hat{\beta_2}\overline{y} $$
where, sample correlation coefficient,
$$ r = \hat{\beta_1}\dfrac{\sigma_X}{\sigma_Y} = \hat{\beta_3}\dfrac{\sigma_Y}{\sigma_X} \tag{1} $$
Partial Standardization:
If I center the sample set to $(\overline{x},\overline{y})$, that is, subtract each X and Y with their mean values,$$
X = X - \overline{X} \ \ , \ \ Y = Y - \overline{Y}
$$
we get,
$$ E(Y|x) = 0 + \hat{\beta_1}x \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_1} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(x_i - \overline{x})^2} \ \ \ \ \\ E(X|y) = 0 + \hat{\beta_3}y \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_3} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(y_i - \overline{y})^2} \ \ \ \ \\ $$
results in
$$ r = \hat{\beta_1}\dfrac{\sigma_X}{\sigma_Y} = \hat{\beta_3}\dfrac{\sigma_Y}{\sigma_X} \tag{2} $$
Full Standardization (Centering and Scaled by Variance)
If I do a full standardization on the sample set,
$$ X = \dfrac{X - \overline{X}}{\sigma_X} \ \ , \ \ Y = \dfrac{Y - \overline{Y}}{\sigma_Y} $$
we get,
$$ E(Y|x) = 0 + \hat{\beta_1}x \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_1} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(x_i - \overline{x})^2} \ \ \ \ \\ E(X|y) = 0 + \hat{\beta_3}y \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_3} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(y_i - \overline{y})^2} \ \ \ \ $$
results in
$$ r = \hat{\beta_1} = \hat{\beta_3} $$
Experiment:
I then generated a sample dataset for various given correlation and observed the output regression lines as below.
Questions:
- The result clearly empirically shows how $r$ gives a good measure on correlation. But how to arrive at that mathematically? Especially from regression lines? How out of no where, I brought in SD? What is the thought process to bring that? (should not say, to get r at the end :). Why not any other parameter? Rephrasing the question again. How does Pearson ended up with that definition of $r = \dfrac{cov(X,Y)}{\sigma_X \sigma_Y}$, especially deciding to use product of SDs in denominator. Is there a geometric intuition possibility like here for covariance that brings out the idea that of course it has to be product of SDs in denominator? This is very important gap of my understanding, to be filled.
- Is there any advantage of just partial standardization at all? Wiki calls that as centered data to calculate $r$.
- Where and how do I connect cosine similarity here? That is,
$$ \mathrm{cos}\theta = \dfrac{\vec{a}\bullet\vec{b}}{\lvert a \rvert\lvert b \rvert} $$
MWE with code is here