I analyze a dataset consisting of responses to two conditions with a multiple regression analysis. I dummy code the two conditions to retrieve the contribution of each of them to the data. Up to here, no problem, I know what I am doing.
Second, I want to check that the difference between the two conditions (Co1 and Co2) is not confounded by another variable of no interest (NoI). I know that in a multiple regression model, the estimates of each condition correspond to the expected change in the data when a change of 1 is observed in its predictor, while all others are held constant. I thus add a covariate to the analysis with NoI and look at the estimates. Provided NoI does share some variance with Co1 and Co2, should I consider that NoI has taken all of the variability of the data that it can account for?
If yes, then, Imagine I'd be looking at it the other way around: I would want to know the impact of NoI on my data, but would have also two confounding conditions Co1 and Co2. I'd run the same model and observe that the estimate for NoI is smaller when Co1 and Co2 are accounted for. I would say with the same model that Co1 & Co2 have taken all of the variability of the data that they can account for... If NoI shares some variance with Co1 & Co2, this is contradictory with the previous paragraph.
So I guess that each NoI and Co takes its share (half?) of the variability it can account for in the data? but how is that share determined?
I've read this question and the answer and thought I understood everything: that it depends on the way the sum of squares is split, until I read this comment to find out that we're talking of sum of squares only when testing for effects, not when estimating. This confuses me a lot. I don't understand why the problem of sharing variance or sum of squares is different in ANOVA and when doing multiple regression. I get all the more confused when I read elsewhere that ANOVA and multiple regression are one and the same thing.
I also came across the idea that we could orthogonalize regressors sequentially, as described here, and done with this code so that they are completely independent and only account for their own share of variance. This is appealing but does not seem like a wide spread practice.
In sum, I need help for
- Understanding how variability in the data is shared/split/not between two correlated regressors.
- What aspects of ANOVA and multiple regression are one and the same, and what is different?
- Knowing if it is best practice to sequentially orthogonalize regressors as mentioned above.
I hope this question makes sense. I'm really struggling to understand what I'm doing here...