0

I came across the statement that: $$r^2 = R^2$$

I am now wondering how and why?

What I have thought myself

The coefficient of determination, $R^2$ is:

$$ \begin{align*} R^2 &= \frac{SSE}{SST} \\ \end{align*} $$

We also know that: $$ \begin{align*} \hat{\beta}_1 &= \frac{Cov(X,Y)}{Var(X)} \\ &\rightarrow \rho_{x,y} = \frac{Cov(X,Y)}{\sigma_x * \sigma_y*}\\ & \quad \quad \quad \rightarrow Cov(X,Y) = \rho_{x,y} * \sigma_x* \sigma_y\\ &= \frac{\rho_{x,y} * \sigma_x* \sigma_y}{\sigma_x^2} = \frac{\rho_{x,y} * \sigma_y}{\sigma_x} \\ &\rightarrow \rho_{x,y} = \hat{\beta}_1 * \frac{\sigma_x}{\sigma_y} \end{align*} $$

Somehow we now need to conclude: $$\rho_{x,y}^2 = R^2$$

Intuitively/from uni: regressing is a correlation analysis

Maybe with pure mathematics we get the result.

$$ \begin{align*} R^2 &= \frac{SSE}{SST} \\ &= \frac{\sum^{n}_{i=1} (\hat{Y}-\bar{Y})}{\sum^{n}_{i=1} (Y-\bar{Y})} \end{align*} $$

$$ \begin{align*} \rho_{x,y} &= \frac{Cov(X,Y)}{\sigma_x * \sigma_y*}\\ &= \frac{\sum^{n}_{i=1} (X-\bar{X}) (Y - \bar{Y})}{(\sum^{n}_{i=1}(X-\bar{X})^2)^{0.5} (\sum^{n}_{i=1}(Y-\bar{Y})^2)^{0.5}} \end{align*} $$

For once, I am stuck on how to deal with sums of random variables (i.e. how to include 0.5 into the sum sign..) (if someone had some rules i could study, that'd be great).

On the other hand, do I not know how to introduce $\hat{Y}$ into the equation, to formulate $\frac{SSE}{SST}$

Could someone give me hints, so that I can understand this relation better.

Maybe it seems completely stupid what I am missing here... I think of calculating $R^2$ completely different from calculating correlation, $r$. One requires me to estimate betas and the other one is just covariance and variances (resp. squareroots). I am missing something, I think...

Billy
  • 33
  • 3
  • Highly relevant: https://stats.stackexchange.com/questions/437919. This generalizes the relationship to multiple regression. It also (immediately) gives an answer once you observe that the standardized simple regression coefficient always equals the correlation coefficient. Since this is a property of regression, *there are no random variables involved.* It's pure algebraic manipulation of the formulas. – whuber Apr 27 '21 at 14:04

0 Answers0