Definition of a simple linear regression model

Question

A while ago I was trying (not entirely successfully) to figure out the definition of a regression model. Now I am narrowing it down to a simple linear regression and trying to identify (loosely speaking) where the basic model ends and the optional assumptions begin. Wooldridge "Introductory Econometrics: A Modern Approach" (6th edition, 2016) states the following at the beginning of Chapter 2:

In writing down a model that will “explain $y$ in terms of $x$,” we must confront three issues. First, since there is never an exact relationship between two variables, how do we allow for other factors to affect $y$? Second, what is the functional relationship between $y$ and $x$? And third, how can we be sure we are capturing a ceteris paribus relationship between $y$ and $x$ (if that is a desired goal)? We can resolve these ambiguities by writing down an equation relating $y$ to $x$. A simple equation is $$ y=\beta_0+\beta_1 x_1+u. \quad [2.1] $$ Equation $(2.1)$, which is assumed to hold in the population of interest, defines the simple linear regression model. (Emphasis is mine.)

This does not look complete enough to indicate anything about $y$ and $x$ probabilistically. As far as I understand, one could write such an equation for any variables $y$ and $x$, with any chosen constants $\beta_0$ and $\beta_1$, and it would hold as long as we choose the right values of $u$. So this does not look like a good candidate for a definition of a regression model to me.

Now on the other end of the spectrum we could have a linear model $$ y|x_1\sim i.i.D(\beta_0+\beta_1 x_1,\sigma^2) $$ for some density $D$ characterized by a location and a scale parameter. This is probably too restrictive as a definition of a simple linear regression [model], because I think we could still call it one if the scale above was a function of $x_1$ or if the distribution was something else than the specific $D$.

In between too loose and too restrictive, we can have models like $$ \mathbb{E}(y|x_1)=\beta_0+\beta_1 x_1, $$ or $$ \mathbb{E}(y|x_1)=\beta_0+\beta_1 x_1, \quad \text{Var}(y|x_1)=\sigma^2 $$ and other. Somewhere there I expect to find a definition that makes the most sense in the context of the term "a simple linear regression [model]".

So what is the definition of a simple linear regression [model]?

I began my answer at https://stats.stackexchange.com/a/148713/919 with a general formulation of a probabilistic regression model. The discussion that follows covers all your examples and more, so I suspect it may be a complete answer to your questions here. — whuber, Feb 12 '20 at 15:13
(2.1) looks pretty much like a simple linear regression model, though it would be usual to have $u$ have mean $0$ and be uncorrelated with the $x_i$. This allows finding in some sense optimal estimates of $\beta_0$ and $\beta_1$. A stronger assumption would be that $u$ is independent of the $x_i$ and i.i.d. (or even be normally distributed). — Henry, Feb 12 '20 at 15:26
@whuber, in your answer, *A linear model of a linear relationship with additive errors* as well as the other items in that list suffer from the same problem as $(2.1)$ above. Or is the problem I have indicated not really a problem? You do write *A model has *additive errors* when $f$ is linear in $\varepsilon$. In such cases it is *always* assumed that $\mathbb{E}(\varepsilon)=0$*. After this addition, is the linear regression model complete? If so, what does it tell us probabilistically about $y$ and $x$? — Richard Hardy, Feb 12 '20 at 15:28
I urge you to look at the initial formulation of the model as $Y = f(X,\theta,\epsilon).$ Everything else is just special cases of that presented in order to address the specific question of what "linear" might mean. This general formulation has none of the issues you attach to (2.1). — whuber, Feb 12 '20 at 15:30
@whuber, OK, so what does $Y=f(X,\theta,\epsilon)$ tell us about the relationship between $Y$ and $X$? The expression has to restrict the possible relationship at least in some way to be informative, doesn't it? — Richard Hardy, Feb 12 '20 at 15:32
@Henry, right. So what definition would you suggest? In the process of adding additional assumptions to $(2.1)$, where exactly do we stop and call it a simple linear regression [model]? — Richard Hardy, Feb 12 '20 at 15:34
Of course. It all comes down to how you define $f.$ It's not arbitrary! It expresses your model. I hope the eight examples I give show how $f$ is determined and well-defined in particular instances. They explicitly show what it can mean to "stop and call it a ... linear ... model." — whuber, Feb 12 '20 at 15:41
@whuber, I do not understand. Similarly to one of the cases in your answer, in $(2.1)$ we supposedly have a linear relationship. However, $(2.1)$ does not tell us anything probabilistically about $y$ and $x$. This is my main point of criticism of $(2.1)$. To take a concrete illustration of $(2.1)$, let $$y=3+2x_1+u.$$ By knowing this, I know absolutely nothing about the distribution of $y$ given $x_1$. What is such a model good for? What information does it contain? Is that enough to call this a simple linear regression [model]? Is it not a waste of paper (when printed)? — Richard Hardy, Feb 12 '20 at 15:54
I'm sorry? Given $x_1,$ the distribution of $y$ is the distribution of $u,$ shifted by $3+2x_1:$ you know *everything* about that conditional distribution that you know about $u.$ In some circumstances you might specify that $u$ has a Normal$(0,\sigma^2)$ distribution; it other circumstances you might only require that $u$ have zero mean and finite variance; but regardless, it is obvious you *do* have information about the conditional distribution of $y.$ — whuber, Feb 12 '20 at 16:27
@whuber, the problem is, we do not observe $u$ and $(2.1)$ does not indicate anything about the distribution of $u$. (I cannot estimate $u$ or properties of its distribution from the data either, since the model is not informative enough to allow for an estimator that does not hinge on additional assumptions outside of $(2.1)$.) So I maintain what I said in the last comment. This is why I am looking for some assumptions on the distribution of $u$ (possibly conditional on $x_1$) to be added to $(2.1)$ so that at least something could be learnt about the distribution of $y$ given $x_1$. — Richard Hardy, Feb 12 '20 at 17:47
I have stated, quite explicitly, the kinds of assumptions that often are made about $u.$ I'm at a loss to explain further, because it seems that you overlook everything I have said about $u$ ($\varepsilon$ in my referenced post). I agree that the Woodridge quote is incomplete, but I'm confident that either in context or somewhere immediately afterwards he provides assumptions about $u.$ — whuber, Feb 12 '20 at 18:30
@whuber, I am honestly trying, but as you see I must be (partly) misunderstanding what you wrote in that answer. I presume you do have an answer to my question but probably do not want to spell it out (e.g. so as not to repeat yourself). But just to help me who is lost here, could you please write down a definition of a simple linear regression model? — Richard Hardy, Feb 12 '20 at 18:53
In the spirit of Wooldridge's probability-free formulation, let the data be a design matrix $X$ and response matrix $Y.$ The model is a set of functions $\mathcal{F}$ mapping possible $X$ into possible $Y.$ A *loss function* $\mathcal{L}$ maps ordered pairs of $Y$-type variables into the nonnegative reals. "Regression" is the task of finding an $f\in\mathcal{F}$ that minimizes $\mathcal{L}(Y,f(X)).$ A "simple linear regression" model is $Y\in\mathbb{R}^n,$ $X\in(\mathbb{R}^k)^n,$ $\mathcal{F}=\{X\to X\beta\mid\beta\in\mathbb{R}^k\},$ and $\mathcal{L}$ is a positive-semidefinite quadratic form. — whuber, Feb 12 '20 at 21:32
@whuber, this is interesting, although getting rid of probabilities and random variables is a bit disappointing for a statistician (maybe less so for a machine learner, though). I will think about what can be said about $Y$ given $X$ when probabilistic statements are not among our choices. A quick clarification: does "simple" linear regression imply $k=1$ (or $k=2$ if one of the $X$s is a vector of ones)? — Richard Hardy, Feb 13 '20 at 06:53
It depends on your definition of "simple." If its meaning includes "a single explanatory variable" (possibly including an intercept), then $k\le 2$ (unless your explanatory variable is nominal with more than two categories!). — whuber, Feb 13 '20 at 15:07

sergiu · Answer 1 · 2021-02-09T19:40:16.197

Davidson and MacKinnon in ("Econometric Theory and Methods") address head on the flaw you point out in Wooldridge's presentation: "At this stage as long as we say nothing about the unobserved quantity $u_{t}$, equation (1.01) [$y_t=\beta_0+\beta_1x_t+u_t$] does not tell us anything. In fact, we can allow the parameters $\beta_0$ and $\beta_1$ to be quite arbitrary, since for any given $\beta_0$ and $\beta_1$, the model can always be made to be true by defining $u_t$ suitably. If we wish to make sense of the regression model (1.01), then we must make some assumptions about the properties of the error term $u_t$... Most commonly it is assumed that, whatever the value of $x_t$, the expectation of the random variable $u_t$ is zero. This assumption usually serves to identify the unknown parameters $\beta_0$ and $\beta_1$ in the sense that, under this assumption, equation (1.01) can be true only for specific values of those parameters".

Only because it addresses the same issue, I'll mention that in his book "Extending the Linear Model with R" (page 7), Faraday claims "The construction of the least squares estimates does not require any assumptions about $\epsilon$". This statement is false. The above paragraph explains why. Both authors are world-class experts, so I would file these as typos and nothing more.

Why that particular assumption? What is definitional and what do we need to assume? The CEF error $\epsilon_i$ in $Y_i=E[Y_i|X_i]+\epsilon_i$ is always mean independent of $X_i$. Agrist and Pischke "Mostly Harmless Econometrics": $E[\epsilon_i|X_i]=E[Y_i-E[Y_i|X_i]|X_i]=E[Y_i|X_i]-E[Y_i|X_i]=0.$ That is definitional. But once we assume linearity for the CEF, the linear projection error $u_t$ in $y_t=\beta_0+\beta_1x_t+u_t$ by definition is only uncorrelated with $x_t$, which is a weaker condition than mean-independence. (See Hansen "Econometrics" for the proof). And since regression is at its heart about the conditional mean function $E[y_t|x_t]=\beta_0+\beta_1x_t+E[u_t|x_t]$ we need to assume mean-independence $E[u_t|x_t]=E[u_t]=0$ in order to identify the conditional mean function. In the case when $x_t$ is a non-random/fixed regressor, the assumption $E[u_t|x_t]=E[u_t]$ is superfluous; it is definitional, hence we only need to assume that $E[u_t]=0$.

Most of the times we are looking to do something with our model and thus we'd want the estimators to have some minimal desirable properties. No other assumptions are necessary to derive the OLS estimators or to show that they are unbiased. To derive the variance of the OLS estimators we need to assume that the errors have a constant variance and are uncorrelated. To construct hypothesis tests or confidence intervals we need to either assume the errors are normally distributed or have a large enough sample so that the OLS estimators are approximately normally distributed.

Thank you. So it is $[2.1]$ combined with $\mathbb{E}(u|X)=0$? — Richard Hardy, Feb 08 '21 at 19:22
I think for the purpose of identifying $\beta_0$ and $\beta_1$ it is sufficient to assume $E(u_i|x_i)=0$ (as in Weisberg), or only contemporaneous\same observation mean independence +[2.1]. Hayashi makes the stronger assumption $E(u_i|x_1,x_2,...,x_n)=0$ but seems unnecessary for the simplest model that just wants to solve the problem of "combination of observations" (as in Stigler). In practice we may want to add the rather vacuous assumption that there is sample variation in the regressor (as in Wooldridge SLR.3) otherwise we end up with zero in the denominator of $\hat{\beta_{1}}$. — sergiu, Feb 08 '21 at 20:42

Definition of a simple linear regression model

1 Answers1

Linked