6

In the Classical Regression Model i.e. $\big(E(y|x)=\alpha +\beta x$ and $Var(y|x)=\sigma^2\big)$ with only two coefficients for intercept $\alpha$ and slope $\beta$ of a dummy variable $x$, we can interpret $\alpha$ as the the mean of values for which $x=0$ and $\beta$ as the difference of the means of the data where $x=1$ and $x=0$ respectively. This makes intutitive sense, but how can I formally show that I can deduce those special expression from the standard definition: $$\hat{\alpha}=\bar y -\hat{\beta} \bar x$$ and $$\hat{\beta}=\frac{\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y)}{\frac{1}{n}\sum (x_i-\bar x)^2}.$$

I cannot reach the formulation in terms of group means.

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
Felix H
  • 145
  • 9
  • 2
    Have you tried by substituting the $x_i$s with kind of dummy variables (taking value 0 or 1) in your formula above? Maybe starting from the formula for $\hat{\beta}$ that's a way to move forward.. – Matteo Fasiolo Jan 11 '14 at 17:31
  • This question appears to conflate two distinct concepts: the model and its estimates. The interpretation of $\alpha$ and $\beta$ depends on the *model.* The role of the formulas is solely to make reasonable guesses concerning the actual values of the parameters but otherwise plays no role in their interpretation. Therefore it is illogical--and in principle could be impossible--the derive the interpretation from an analysis of the formulas for the estimates. – whuber Jan 11 '14 at 18:21
  • Related: https://stats.stackexchange.com/questions/354803/fitted-values-of-a-simple-regression-with-intercept-and-dummy/354804#354804 – Christoph Hanck Jul 07 '18 at 13:37

1 Answers1

7

The theoretical model is

$$E(Y\mid X)=\alpha +\beta X$$

Assuming that $X$ is a $0/1$ binary variable we notice that

$$E(Y\mid X=1) - E(Y\mid X=0)=\alpha +\beta -\alpha = \beta $$

I think that the OP asks "does the OLS estimator "mimics" this relationship, being perhaps its sample analogue?"

Let's see: we have that

$$\hat{\beta}=\frac{\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y)}{\frac{1}{n}\sum (x_i-\bar x)^2} = \frac {\operatorname{\hat Cov(Y,X)}}{\operatorname{\hat Var(X)}} $$

Now since $X$ is a binary variable, i.e. a Bernoulli random variable, we have that ${\operatorname{Var(X)} = p(1-p)}$ where $p\equiv P(X=1)$. Under a stationarity assumption, the sample estimate of this probability is simply the sample mean of $X$, denoted $\bar x$ and one can verify that indeed $$\frac{1}{n}\sum (x_i-\bar x)^2 = {\operatorname{\hat Var(X)}}=\bar x (1-\bar x) =\hat p(1-\hat p)$$

Let's turn now to the covariance. We have

$$\operatorname{\hat Cov(Y,X)}=\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y) = \frac{1}{n}\sum x_iy_i -\bar x \bar y$$

Denote $n_1$ the number of those observations for which $x_i=1$. We can write

$$\frac{1}{n}\sum x_iy_i = \frac{1}{n}\sum_{x_i=1} y_i = \frac{n_1}{n}\cdot \frac{1}{n_1}\sum_{x_i=1} y_i = \hat p\cdot (\bar y \mid X=1) = \hat p \cdot \hat E(Y\mid X=1)$$

Also $\bar y = \hat E(Y)$ and using the law of total expectations we have

$$\hat E(Y) = \hat E(Y \mid X=1) \cdot \hat p + \hat E(Y \mid X=0)\cdot (1-\hat p)$$

Inserting all these results in the expression for the sample covariance we have

$$\operatorname{\hat Cov(Y,X)}= \hat p \cdot \hat E(Y\mid X=1) - \hat p\cdot \left[\hat E(Y \mid X=1) \cdot \hat p + \hat E(Y \mid X=0)\cdot (1-\hat p)\right]$$

$$= \hat p(1-\hat p)\cdot \left[\hat E(Y \mid X=1) - \hat E(Y \mid X=0)\right]$$

Inserting all in the expression for $\hat \beta$ we have

$$=\hat{\beta} = \frac {\operatorname{\hat Cov(Y,X)}}{\operatorname{\hat Var(X)}} = \frac {\hat p(1-\hat p)\cdot \left[\hat E(Y \mid X=1) - \hat E(Y \mid X=0)\right]}{\hat p(1-\hat p)} $$

$$\Rightarrow \hat{\beta} = \hat E(Y \mid X=1) - \hat E(Y \mid X=0)$$

which is the sample analogue/feasible implementation of the theoretical relationship. I leave the demonstration related to the $\hat \alpha$ for the OP to work out.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • would it be the same result if we have a Heteroscedasticity model? for example: variance depends on X, the estimator of beta would be the same as above? – FantasticAI Dec 19 '20 at 11:32