Adjusted Mean of Variable given Single Covariate with Weak/Moderate Relationship

Question

Say I have two variables X and Y, each a data set with corresponding data points 1 through n. These two variables have some casual, small but significant relationship (low r value). Then I am unsure of how to most accurately calculate the adjusted data points of variable Y after controlling for variable X, since the low r-value seems to point towards two different answers. I'll explain.

Here is a link to a website which describes the general formula for calculating an adjusted value.

Each "group" j in this case is simply a data point within the X and Y variables, with j=1,...,n.

The part I want to focus on now is 'b', the common regression coefficient. You can calculate this value by doing a simple linear regression between X and Y, finding a relationship y=mx+c, and then using the slope m as 'b'. The problem is, as someone eloquently explains in another thread, that the outcome matters based on which variable you use as X, and which as Y:

Algebraically (... in a world with "perfect" data) the slope of one would simply be the inverse of the other, but actually the r^2 value comes into play, which makes statistical sense. But this has huge ramifications for small relationships between variables. Now, depending on which way I calculate the linear regression to get the regression coefficient, I end up with different answers. If I use X as X and Y as Y, I use the slope as my regression coefficient. If I use X as Y and Y as X, I use the inverse of the slope as my regression coefficient. But since these are not equal when r is not 1, I end up with very different adjusted mean values depending on which regression coefficient I use.

So I thought of a way to sort of account for this in calculating an adjusted mean when the relationship is small. Since the two regression coefficients (b1 and b2) relate by the following equation:

b1 = (r^2)*(1/b2)

Then we can say:

b1/r = r/b2

Then I could use the value of b1/r (or r/b2) as my new regression coefficient, to be plugged into the adjusted mean equation and calculated my adjusted Y values most accurately. Is it statistically valid to use this result as my regression coefficient, which seems sort of like a happy medium between b1 and 1/b2? Am I making this too complicated and there is some other way to do this, such as simply using b1? Would love some input on this.

score 1 · Answer 1 · answered Mar 13 '19 at 23:31

Alright, I am going to close this before anyone wastes their time explaining why I was wrong. Just debunked my own theory and did, after all, realize how much simpler it was than I made it out to be.

There is a reason for the linear regression to output the slopes that it does, either direction you calculate it. The resulting slope has always taken into consideration the r^2 value, since this is a measure of the percent of variance of each factor that is explained by the other. By my "happy medium" technique, I basically am saying that the r value is this same percent of variance explained, instead of the r^2 value.

In a situation where we want to predict the adjusted value of a point in Y if the corresponding point in X was average instead of its current value, then we must take the percent of variance explained (r^2) fully into consideration. Only if we want to assume that each factor is explains the other completely (for whatever conceptual reason) would we then want to divide the slope of the linear regression by the r^2 value before using that calculated result as our regression coefficient in the adjusted mean equation.

Adjusted Mean of Variable given Single Covariate with Weak/Moderate Relationship

1 Answers1