0

I am playing with the simple linear regression (y=ax+b both x and y are scalars). And I notice that after I swap x and y, the coefficient changes(which we can find a nice answer here). But the t-stat and F-stat don't change. Can anyone explain this stuff intuitively?

Tian
  • 43
  • 5
  • Hi: Could you check that because they should change. – mlofton Aug 17 '19 at 03:56
  • @mlofton No You can do some toy experiments. – Tian Aug 17 '19 at 05:42
  • 1
    Please add to your question that you are referring to the single predictor case - this makes the question more specific and easier to answer. In a multiple regression, everything is much more complicated (and your statement does not hold as indicated by @mlofton). – StoryTeller0815 Aug 17 '19 at 06:31
  • Please read https://stats.stackexchange.com/questions/22718/what-is-the-difference-between-linear-regression-on-y-with-x-and-x-with-y, which has many posts explaining what's going on. – whuber Aug 17 '19 at 14:21
  • 1
    @Tian In simple regression, the t and the F statistic are both just functions of the correlation and the sample size. – Glen_b Aug 18 '19 at 02:09

1 Answers1

3

In general, it is impossible to distinguish between the causal directions in linear regression by means of statistical criteria. That the model fit does not change when you swap x and y is one symptom of this fact.

I assume that by simple you mean "only one predictor". In this case, you can see that regression is symmetric with respect to what is x and what is y, e.g., when you standardize both x and y. In this case, your regression weight is equivalent to the Pearson correlation which is known to be "symmetrical" (i.e., $r_{xy}$ = $r_{yx}$).

More generally, you can describe the regression weight $b_1$ with the following formula:

$b_1 = r_{xy} \cdot \frac{s_x}{s_y}$

Hence, your slope is just a rescaled version of the correlation - when you swap x and y, all you are doing is rescaling the slope to a new metric, specifically:

$b_{yx} = r_{xy} \cdot \frac{s_x}{s_y} \Leftrightarrow r_{xy} = b_{yx}\cdot \frac{s_y}{s_x}$

$b_{xy} = r_{xy} \cdot \frac{s_y}{s_x} = b_{yx}\cdot \frac{s_y}{s_x}\cdot \frac{s_y}{s_x} = b_{yx}\cdot \frac{s^2_y}{s^2_x}$

You can clearly see that your "new" regression coefficient is just a rescaled version of the old one.

Here a numerical example:

# Simulate Data
n = 100
beta_0 = 50
beta_1 = 0.5
var_x = 15
var_res = 15
x = rnorm(n = n, mean = 100, sd = sqrt(var_x))
epsilon = rnorm(n = n, mean = 0, sd = sqrt(var_res))
y = beta_0 + beta_1 * x + epsilon

# fit regression model
fit1 = lm(y ~ x)
fit2 = lm(x ~ y)

b11 = coef(fit1)[2]
b12 = coef(fit2)[2]

# relation to the correlation:
b11 * sd(x)/sd(y)
        x 
0.6693829 
cor(x,y)
[1] 0.6693829

# relation to the swapped directions regression
b11 * var(x)/var(y)
       x 
0.550191 
b12
       y 
0.550191 

The implication of this relation is that the actual fit as indicated by $R^2 = r_{xy}^2$ cannot change when you swap directions. I hope this is convincing for you. :-)

So far so good, what does this mean for your test statistics?

You can look at this from two different points of view. The first - without a proof – the t-test for $b_1$ is equivalent to the t-test for Pearson's correlation against zero. This intuitively leads you to the conclusion that it will not change because the correlation does not change.

How can we prove this? A proof is easier from the perspective of the F-Test.

For this, we know: $F = \frac{R^2}{1-R^2} \cdot \frac{n-2}{1}$

As we have established above that $R^2$ does not change, we can now see that $F$ does also not change. So what about $t$?

The F-Test is a model comparison between an intercept-only model and the model with our predictor. This is equivalent to the t-Test for this predictor and it holds that: $t = \sqrt{F}$ Therefore, the proof for F also applies to the t-test and we have shown all you wanted to know.

Numerical example for this:

# Model summaries
summary(fit1)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.6232 -2.8308 -0.2586  3.0665  8.3884 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 49.89140    9.10394   5.480 3.30e-07 ***
x            0.50353    0.09086   5.542 2.53e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.032 on 98 degrees of freedom
Multiple R-squared:  0.2386,    Adjusted R-squared:  0.2308 
F-statistic: 30.71 on 1 and 98 DF,  p-value: 2.526e-07

summary(fit2)

Call:
lm(formula = x ~ y)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.1542  -2.4443  -0.3467   2.8890  11.3503 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 52.57188    8.58468   6.124 1.90e-08 ***
y            0.47385    0.08551   5.542 2.53e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.911 on 98 degrees of freedom
Multiple R-squared:  0.2386,    Adjusted R-squared:  0.2308 
F-statistic: 30.71 on 1 and 98 DF,  p-value: 2.526e-07


# Calculate F from R-squared
cor(x,y)^2 / (1-cor(x,y)^2) * (100-2)
[1] 30.71059

# Calculate t from R-squared via F
sqrt(cor(x,y)^2 / (1-cor(x,y)^2) * (100-2))
[1] 5.541714

Well, as usually, we have now had quite some work to prove something that is referred to as "well-known" in text books without any further proof. ;)

StoryTeller0815
  • 513
  • 2
  • 9