Say I have two variables X and Y, each a data set with corresponding data points 1 through n. These two variables have some casual, small but significant relationship (low r value). Then I am unsure of how to most accurately calculate the adjusted data points of variable Y after controlling for variable X, since the low r-value seems to point towards two different answers. I'll explain.
Here is a link to a website which describes the general formula for calculating an adjusted value.
Each "group" j in this case is simply a data point within the X and Y variables, with j=1,...,n.
The part I want to focus on now is 'b', the common regression coefficient. You can calculate this value by doing a simple linear regression between X and Y, finding a relationship y=mx+c, and then using the slope m as 'b'. The problem is, as someone eloquently explains in another thread, that the outcome matters based on which variable you use as X, and which as Y:
Algebraically (... in a world with "perfect" data) the slope of one would simply be the inverse of the other, but actually the r^2 value comes into play, which makes statistical sense. But this has huge ramifications for small relationships between variables. Now, depending on which way I calculate the linear regression to get the regression coefficient, I end up with different answers. If I use X as X and Y as Y, I use the slope as my regression coefficient. If I use X as Y and Y as X, I use the inverse of the slope as my regression coefficient. But since these are not equal when r is not 1, I end up with very different adjusted mean values depending on which regression coefficient I use.
So I thought of a way to sort of account for this in calculating an adjusted mean when the relationship is small. Since the two regression coefficients (b1 and b2) relate by the following equation:
b1 = (r^2)*(1/b2)
Then we can say:
b1/r = r/b2
Then I could use the value of b1/r (or r/b2) as my new regression coefficient, to be plugged into the adjusted mean equation and calculated my adjusted Y values most accurately. Is it statistically valid to use this result as my regression coefficient, which seems sort of like a happy medium between b1 and 1/b2? Am I making this too complicated and there is some other way to do this, such as simply using b1? Would love some input on this.