What is the difference between a linear regression with a dummy variable and two separate regressions for each group?

Question

I am interested in the connection between a multiple linear regression including a dummy variable (0/1) and two separated regressions split up by this dummy variable, i.e. two distinct regressions for each category (0-1). The question is: How are these numbers related? Can they be translated into each other? Here is a numeric examples in R:

lm(mpg ~ disp + vs, data=mtcars)
#Coefficients:
#  (Intercept)         disp           vs  
#      27.9493      -0.0369       1.4950 

lm(mpg ~ disp, data=mtcars[mtcars$vs==0,])

#Coefficients:
#  (Intercept)         disp  
#     25.63755     -0.02937  

lm(mpg ~ disp, data=mtcars[mtcars$vs==1,])

#Coefficients:
#  (Intercept)         disp  
#     34.03526     -0.07156

See https://stats.stackexchange.com/questions/17110, https://stats.stackexchange.com/questions/13112, and https://stats.stackexchange.com/questions/12797, *inter alia.* — whuber, May 26 '20 at 16:01

Richard Hardy · Accepted Answer · 2020-05-26T15:48:24.537

Let me denote mpg by $y$, disp by $x$ and vs by $d$. Then you have two models:

Model 1 $$ y=\beta_0+\beta_1 x+\beta_2 d+\varepsilon $$ Model 2 \begin{aligned} y&=\gamma_0+\gamma_1 x+u &&\text{for} \quad d=0 \quad \text{and}\\ y&=\delta_0+\delta_1 x+v &&\text{for} \quad d=1. \end{aligned}

Model 1 assumes a common slope ($\beta_1$) of $x$ for both $d=0$ and $d=1$, while Model 2 does not assume that (the slopes are $\gamma_1$ and $\delta_1$).

You may or may not specify the distributions of $\varepsilon$, $u$ and $v$ explicitly, but implicitly Model 1 assumes the error term has the same distribution for both $d=0$ and $d=1$, while Model 2 does not assume that. The implicit or explicit assumptions on the error distributions affect the finite-sample distribution of the estimators of intercepts and slopes. Assuming the error distributions allow for asymptotic normality of the intercept and slope estimators, the estimators' asymptotic covariance matrix (and hence standard errors) is affected. Without the latter assumption, the estimators' joint asymptotic distribution is affected.

Let me think about whether anything interesting happens with the intercepts (besides the effect of the potentially different distributions of errors on the finite-sample and asymptotic properties of the estimators of the intercepts). Unlike the slopes, I do not currently see a corresponding difference in assumptions about the intercepts between the two models. — Richard Hardy, May 26 '20 at 16:00

What is the difference between a linear regression with a dummy variable and two separate regressions for each group?

1 Answers1