ANOVA vs R-Squared (Explained Variance)

Question

My goal is to determine whether for linear regression some predictors uniquely improve the fit beyond that which is already available via all other predictors combined. I have originally tried multi-way ANOVA and partial correlation for this purpose. I have recently learned that multi-way ANOVA performs poorly in the scenario with high multicollinearity. Namely, the reported significances and explained variances of individual predictors are not robust, and thus may misrepresent the true relations within the data.

Here is a solution that came to my mind:

Fit the full model to the data, find the coefficient of determination $r^2_{full}$
Exclude one of the predictors (e.g. $X$) from the model, fit the rest of the predictors, find $r^2_{/X}$
Then the gain of explained variance uniquely due to the excluded predictor $X$ is

$$G(X) = r^2_{full} - r^2_{/X}$$

Naively, this looks like a robust solution for detecting partial effects. My simulations show that it works better than partial correlation on some simple noisy model data, whereas the latter is known to fail to correctly discriminate between a true partial effect and multicollinearity in the presence of noise.

Questions:

Does this approach have a name?
Does it work in practice?
Is there a nice procedure to test $G(X)$ for significance (against null hypothesis that $X$ is random and can only explain variance by chance)? Permutation-testing seems to work for me, I'm just wondering if there is something similar to an F-test.

Note: I am only interested in applying the method for low total number of predictors such as 2 or 3. I am aware of the kitchen sink regression effect, so I want to make it clear that I do not intend to stretch this design to the extreme.

What you are describing seems to be (semi-) partial $R^2$. But your original goal will be impossible to achieve when there are many predictors or collinearity, as the sample size is unlikely to be sufficient to disentangle the predictors. — Frank Harrell, Oct 19 '21 at 12:38

lmaosome · Accepted Answer · 2021-10-19T13:31:14.377

1

If you are willing to assume normally distributed errors, you could simply apply a likelihood ratio test as the models you want to compare are nested.

Edit: Assuming normal errors $\epsilon$ (with variance = 1), the likelihood function of the parameters in the model $$y_i = \theta_1x_{1i} + \theta_2x_{2i} + \theta_3x_{3i} + \epsilon_i$$ is given by $$\mathcal L(\theta) = (2\pi)^{-\frac 32}\exp\left(-\frac 12\Vert y - X\theta\Vert_2^2\right),$$ where $y$ is the n×1 vector of $y_i$'s, $X$ is the n×3 matrix of $(x_{1i}, x_{2i}, x_{3i})'$ and $\theta = (\theta_1,\theta_2,\theta_3)'$. The likelihood ratio test for testing, say, $\theta_3 = 0$ is given by $$\text{LRT} = \frac{\sup_{\theta\in\mathbb R^2\times\{0\}}\mathcal L(\theta)}{\sup_{\theta\in\mathbb R^3}\mathcal L(\theta)},$$ which is approximately $\chi^2$-distributed with 1 degree of freedom.

In fact, this test does not test whether the improvement in the goodness of fit in terms of the $R^2$ is significant, however, it tests whether the improvement in terms of the likelihood, which could be seen as some measure of goodness of fit (see, for instance, the Nagelkerke $R^2$, McFadden $R^2$, Cox and Snell's $R^2$), is statistically significant.

The $R^2$, in the above example, has a Beta distribution with shape parameters $1$ and $(n - 3)/2$. However, the linear combination of two dependent (!) Beta distributed random variables is, I guess, no standard distribution. I don't think that following this track will help you achieving what you want. This is why I suggested to test the difference in likelihoods using the likelihood ratio test.

edited Oct 19 '21 at 13:31

answered Oct 19 '21 at 11:30

lmaosome

140
8

With all due respect, this neither answers the question I have asked, nor is sufficiently detailed for me to understand what you are suggesting. Consider making a comment instead or rewriting into a full answer – Aleksejs Fomins Oct 19 '21 at 11:36
I am sorry for that. I edited the answer – lmaosome Oct 19 '21 at 13:31
Thanks for writing up the post. Now I understand your suggestion – Aleksejs Fomins Oct 19 '21 at 18:35
The problem with Likelihood Ratio is that it requires an assumption about the noise model. I do not know whether the residual distribution is gaussian, and even if it happens to be gaussian, I have no prior knowledge of the variance of residual terms. I work in biology, where effects in question are frequently a combination of known effects and unknown effects that are impossible to control experimentally. The goal is to find metrics that can guide into the right direction without having to make too many prior assumptions – Aleksejs Fomins Oct 19 '21 at 18:42
Actually, it is easy to show that log-likelihood ratio for gaussian noise is just the difference between $R^2$ of the two models, which is exactly the metric that I propose. – Aleksejs Fomins Oct 19 '21 at 18:53
This actually answers my question: I can use $\chi^2_1$ test if I can guarantee residuals to be gaussian, and I (probably) have to resort to permutation-testing if residuals cannot be guaranteed to be normal – Aleksejs Fomins Oct 19 '21 at 18:59

ANOVA vs R-Squared (Explained Variance)

1 Answers1