2

Much of the empirical literature seems to interpret the coefficient b of a simple linear (OLS) regression Y = a + bX + e as follows: a one-unit increase in X will, ON AVERAGE, cause a b unit increase in Y. My question is - where is the "on average" coming from? Is this interpretion ok? (although we use LS criterion)

xaver
  • 31
  • 4
  • 1
    The description is expressing the fact that b is an estimate of the slope of the regression line. I think the use of "on average" just expresses that there is a difference between a slope parameter and its estimator. I can see how this term "on average" is subject to misinterpretation. – Michael R. Chernick Dec 14 '16 at 12:31
  • It could also refer to the fact that there is an error term, but the error term is zero on average, so does'nt matter to the interpretation. – kjetil b halvorsen Dec 14 '16 at 12:52
  • You might see this in your text: $E(Y|X) = a + bX$, and/or $E(e|X) = 0$. This model assumption is that but for the random error term, $e$, which is "on average" $0$ (or $0$ in expectation), a one-unit increase in $X$ will cause a $bX$ change in $Y$. – lmo Dec 14 '16 at 12:57
  • Very little of the literature makes this interpretation, because careful writers make it very clear that the model makes no assertions about any *causal* relationship between $X$ and $Y$. I'm sure that the good writers are avoiding such language and using phrases like "the regression indicates that the (conditional) expectation of $Y$ changes by $b$ units for each one-unit difference in $X$." – whuber Jun 07 '17 at 16:31

1 Answers1

1

This is a really interesting question---especially in the context of multiple regression.


Last year, I was the teaching assistant for an intro statistics (for business majors) course where the instructor was John Tukey's student. The instructor pushed for the interpretation:

On average, after removing the affects of the other covariates, the response changes by $b$ [y units] per [x units].

The "on average" is appearing since the increment by $b$ is not deterministic--the increment is corrupted by mean zero noise.

I like this framing because it (a) keeps the interpretation sounding clearly non-casual and (b) avoids the colinearity problems in the standard interpretation posed in the original post. Indeed, when we change $x_1$ by one unit, if $x_1$ and $x_2$ are correlated, then $x_2$ will, on average, change too!

The above interpretation relies on a property of partial regression plots, which are nicely explained here: What does an Added Variable Plot (Partial Regression Plot) explain in a multiple regression? Using the notation from this question without explanation, the reason is just that the regression of $(I-H_{-j})y$ against $(I-H_{-j})x_j$ has slope \begin{align*} [x_{j}^T (I - H_{-j})x_j]^{-1}[x_j^T (I - H_{-j})y] & = [x_{j}^T (I - H_{-j})x_j]^{-1}[x_j^T (I - H_{-j})(x_j \hat\beta_j + r)] \\ & = \hat\beta_j + [x_{j}^T (I - H_{-j})x_j]^{-1}[x_j^T (I - H_{-j})r] \\ & = \hat\beta_j, \end{align*} since $(I-H_{-j})x_j \in \mathrm{im}(X)$ and $r \perp \mathrm{im}(X)$.


I think we can get even more interesting, though. When $y = f(X) + \epsilon$, and $f(X) \neq X \beta^*$ isn't linear, there still exists interesting explanations of what the OLS estimates for the slope produce. This is discussed excellently here:

Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M., and Zhao, L. (2014). Models as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression.

In that paper, it'd discussed that OLS is, in this non linear case, estimating the best linear approximation, and interpretation with similar spirit to above are given.

user795305
  • 2,692
  • 1
  • 20
  • 40