Why does $r^2$ between two variables represent proportion of shared variance?

Question

Firstly, I appreciate that discussions about $r^2$ generally provoke explanations about $R^2$ (i.e., the coefficient of determination in regression). The problem I'm seeking to answer is generalizing that to all instances of correlation between two variables.

So, I've been puzzled about shared variance for quite a while. I've had a few explanations offered but they all seem problematic:

It's just another term for covariance. This can't be the case, as factor analysis literature differentiates between PCA and EFA by stating that the latter accounts for shared variance and the former does not (PCA obviously is accounting for covariance in that it is operating over a covariance matrix, so shared variance must be a distinct concept).
It is the correlation coefficient squared ($r^2$). See:
- http://www.philender.com/courses/linearmodels/notes1/var1.html, or
- http://www.strath.ac.uk/aer/materials/4dataanalysisineducationalresearch/unit6/correlationcoefficient/

This makes slightly more sense. The trouble here is interpreting how that implies it is shared variance. For example, one interpretation of 'sharing variance' is ${\rm cov}(A,B)/[{\rm var}(A)+{\rm var}(B)]$. $r^2$ doesn't reduce to that, or indeed a readily intuitive concept [${\rm cov}(A,B)^2/({\rm var}(A)\times{\rm var}(B))$; which is a 4 dimensional object].

The links above both attempt to explain it via a Ballentine diagram. They don't help. Firstly, the circles are equally sized (which seems to be important to the illustration for some reason), which doesn't account for unequal variances. One could assume it is the Ballentine diagrams for the standardized variables, hence equal variance, in which case the overlapping segment would account for the covariance between two standardized variables (the correlation). So $r$, not $r^2$.

TL;DR: Explanations of shared variance say this:

By squaring the coefficient, you know how much variance, in percentage terms, the two variables share.

Why would that be the case?

Both points ("covariance" and "r-squared") are correct interpretations. I recommend you [this](http://stats.stackexchange.com/a/83370/3277) my answer: $r^2$ is the product of two relative magnitudes of the covariance, and is quasi joint probability. — ttnphns, Jun 21 '14 at 07:45
Within EFA, they usually say "common variance", not "shared variance". Common variance is the realm of total collinearity. On the other hand, the term "shared variance" is not quite defined (your question is about how to define it). — ttnphns, Jun 21 '14 at 09:15
Venn (Ballentine) diagrams fail to properly relate the concept of $r^2$ because covariance magnitude is not the intersection area of the two circles (variances). Covariance depend on both variances. The size of covariance can be bigger than the size of the smaller variance (which certainly impossible to show on Venn by intersection). — ttnphns, Jun 21 '14 at 11:27
So suppose I had r^2 for A,B = 0.6; would one way to interpret it be that if I were to hold B constant, we should only expect to see 40% of the original variance remaining in A (and conversely if I held A constant only 40% of the original variance in B)? — Sue Doh Nimh, Jun 21 '14 at 13:27
That brings us back to the regressional definition of $r^2$ as $1-SSresid/SStot$. So if the situation is homoscedastic you can see easily yourself... — ttnphns, Jun 21 '14 at 14:56
I'm sorry, I'm trying to teach myself a lot of this from scratch, I don't follow. I tried reading your other answer but found it a little tricky (!). Could you please explain what it means for two variables to "share" variance (as distinct from covariance) and how cov^2 over the product of their variances would result in this? I have tried reading quite a few online resources and I've just ended up more confused hence coming here! :S — Sue Doh Nimh, Jun 21 '14 at 17:40
Covariance _is_ "shared variance", raw magnitude of if. Normalized to a relative magnitude, it can be of two versions, r and r-sq. r-sq can be interpreted as % of shared variance in combined variance. — ttnphns, Jun 22 '14 at 10:45
So would this be accurate given homoskedasticity - E(Var(Y)|X) = (1 - r(X,Y)^2)Var(Y) ? — Sue Doh Nimh, Jun 23 '14 at 07:18
Yes, variability of residuals is variability of Y conditional on X. — ttnphns, Jun 23 '14 at 07:31
To understand better what is cov and "shared variance", observe that $cov^2= \sigma_{y'}^2 \sigma_x^2 = \sigma_{x'}^2 \sigma_y^2$, where $y'$ is Y predicted by X and $x'$ is X predicted by Y. — ttnphns, Jun 23 '14 at 17:37
thank you very much! I think I understand...will just take some time to sink in! — Sue Doh Nimh, Jul 02 '14 at 17:54
See [this answer](http://stats.stackexchange.com/a/28143/6633) which might help a bit. — Dilip Sarwate, Dec 06 '14 at 02:49
I think this is a good explanation of covariance that might help you conceptualize things more clearly: http://www.snappyeducation.com/#!covariance/cfz0 — user62190, Dec 05 '14 at 23:48

score 3 · Answer 1 · answered Jun 26 '17 at 21:18

One can only guess what one particular author might mean by "shared variance." We might hope to circumscribe the possibilities by considering what properties this concept ought (intuitively) to have. We know that "variances add": the variance of a sum $X+\varepsilon$ is the sum of the variances of $X$ and $\varepsilon$ when $X$ and $\varepsilon$ have zero covariance. It is natural to define the "shared variance" of $X$ with the sum to be the fraction of the variance of the sum represented by the variance of $X$. This is enough to imply the shared variances of any two random variables $X$ and $Y$ must be the square of their correlation coefficient.

This result gives meaning to the interpretation of a squared correlation coefficient as a "shared variance": in a suitable sense, it really is a fraction of a total variance that can be assigned to one variable in the sum.

The details follow.

Principles and their implications

Of course if $Y=X$, their "shared variance" (let's call it "SV" from now on) ought to be 100%. But what if $Y$ and $X$ are just scaled or shifted versions of one another? For instance, what if $Y$ represents the temperature of a city in degrees F and $X$ represents the temperature in degrees C? I would like to suggest that in such cases $X$ and $Y$ should still have 100% SV, so that this concept will remain meaningful regardless of how $X$ and $Y$ might be measured:

$$\operatorname{SV}(\alpha + \beta X, \gamma + \delta Y) = \operatorname{SV}(X,Y)\tag{1}$$

for any numbers $\alpha, \gamma$ and nonzero numbers $\beta, \delta$.

Another principle might be that when $\varepsilon$ is a random variable independent of $X$, then the variance of $X+\varepsilon$ can be uniquely decomposed into two non-negative parts,

$$\operatorname{Var}(X+\varepsilon) = \operatorname{Var}(X) + \operatorname{Var}(\varepsilon),$$

suggesting we attempt to define SV in this special case as

$$\operatorname{SV}(X, X+\varepsilon) = \frac{\operatorname{Var}(X)}{\operatorname{Var}(X) + \operatorname{Var}(\epsilon)}.\tag{2}$$

Since all these criteria are only up to second order--they only involve the first and second moments of the variables in the forms of expectations and variances--let's relax the requirement that $X$ and $\varepsilon$ be independent and only demand that they be uncorrelated. This will make the analysis much more general than it otherwise might be.

The results

These principles--if you accept them--lead to a unique, familiar, interpretable concept. The trick will be to reduce the general case to the special case of a sum, where we can apply definition $(2)$.

Given $(X,Y)$, we simply attempt to decompose $Y$ into a scaled, shifted version of $X$ plus a variable that is uncorrelated with $X$: that is, let's find (if it's possible) constants $\alpha$ and $\beta$ and a random variable $\epsilon$ for which

$$Y = \alpha + \beta X + \varepsilon\tag{3}$$

with $\operatorname{Cov}(X, \varepsilon)=0$. For the decomposition to have any chance of being unique, we should demand

$$\mathbb{E}[\varepsilon]=0$$

so that once $\beta $ is found, $\alpha$ is determined by

$$\alpha = \mathbb{E}[Y] - \beta\, \mathbb{E}[X].$$

This looks an awful lot like linear regression and indeed it is. The first principle says we may rescale $X$ and $Y$ to have unit variance (assuming they each have nonzero variance) and that when it is done, standard regression results assert the value of $\beta$ in $(3)$ is the correlation of $X$ and $Y$:

$$\beta = \rho(X,Y)\tag{4}.$$

Moreover, taking the variances of $(1)$ gives

$$1 = \operatorname{Var}(Y) = \beta^2 \operatorname{Var}(X) + \operatorname{Var}(\varepsilon) = \beta^2 + \operatorname{Var}(\varepsilon),$$

implying

$$\operatorname{Var}(\varepsilon) = 1-\beta^2 = 1-\rho^2.\tag{5}$$

Consequently

$$\eqalign{ \operatorname{SV}(X,Y) &= \operatorname{SV}(X, \alpha+\beta X + \varepsilon) &\text{(Model 3)}\\ &= \operatorname{SV}(\beta X, \beta X + \varepsilon) &\text{(Property 1)}\\ &= \frac{\operatorname{Var}(\beta X)}{\operatorname{Var}(\beta X) + \operatorname{Var}(\epsilon)} & \text{(Definition 2)}\\ &= \frac{\beta^2}{\beta^2 + (1-\beta^2)} = \beta^2 &\text{(Result 5)}\\ & = \rho^2 &\text{(Relation 4)}. }$$

Note that because the regression coefficient on $Y$ (when standardized to unit variance) is $\rho(Y,X)=\rho(X,Y)$, the "shared variance" itself is symmetric, justifying a terminology that suggests the order of $X$ and $Y$ does not matter:

$$\operatorname{SV}(X,Y) = \rho(X,Y)^2 = \rho(Y,X)^2 = \operatorname{SV}(Y,X).$$

Why does $r^2$ between two variables represent proportion of shared variance?

1 Answers1

Principles and their implications

The results

Linked