Examine whether simple linear regression of y on x is the same as x on y

Question

I want to determine whether or not I get the same regression results when doing regression of $x$ on $y$ and of $y$ on $x$.

Using R's built in lm function I get the following results.

##
## Call:
## lm(formula = y ~ x, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.92127 -0.45577 -0.04136 0.70941 1.83882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0001 1.1247 2.667 0.02573 *
## x 0.5001 0.1179 4.241 0.00217

And

##
## Call:
## lm(formula = x ~ y, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6522 -1.5117 -0.2657 1.2341 3.8946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.9975 2.4344 -0.410 0.69156
## y 1.3328 0.3142 4.241 0.00217

I figured that if the regression lines are the same then

$$y_1 = \alpha + \beta x_1 \Longleftrightarrow x_1 = \frac{y_1- \alpha}{\beta}$$

from lm(y ~ x, data = df1) and

$$x_2 = \alpha_2 + \beta_2 y_2$$

from lm(x ~ y, data = df1) should match up. (Is this correct?)

In my case that would give us (for $y = 1$)

$$\begin{align*}x_1 = \frac{y_1- \alpha}{\beta} = \frac{1 - 3.0001}{0.5001} \approx -3.9994 \\ x_2 = \alpha_2 + \beta_2 y_2 = -0.9975 + 1.3328y = 0.3353 \end{align*}$$

So $x_1 \neq x_2$ and thus there is a difference between linear regression of $y$ on $x$ and that of $x$ on $y$.

Is this correct?

Thanks in advance.

You are forgetting the error term in your equations. There are other issues as well but that should be the start. Please also see [this](https://stats.stackexchange.com/questions/22718/what-is-the-difference-between-linear-regression-on-y-with-x-and-x-with-y). — Dayne, Oct 30 '20 at 17:33

score 4 · Accepted Answer · answered Oct 30 '20 at 17:37

In the case of a simple linear regression:

$$y = \alpha + \beta x + \epsilon$$

Beta can be estimated via $\beta = \frac{\text{Cov}(x,y)}{\text{Var}(x)}$. And so if we flip x and y, the covariance stays the same, it is only the denominator part for the variance that changes. So from there I imagine you can work out when they will (or will not) be equal!

score 3 · Answer 2 · answered Oct 30 '20 at 18:01

It depends on your loss function. A common way is to minimalize the residual sum of squares (case $y \sim x$):

$$ \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2 \rightarrow min$$

This is what your function in R does. It takes into consideration only the vertical distance (in the case when $y$ is your vertical axis).

By slipping $x$ and $y$ it will be the originial horizontal distance minimilized (after summation, of course).

So it is not the same, but there exist other methods too. As a loss function you can choose the Euclidean distance of the points from the regression line and minimize the sum of those errors. In this case your solution should work.

Examine whether simple linear regression of y on x is the same as x on y

2 Answers2