OLS when $\beta_i$ varies across observations

Question

Suppose that the data generating process is given by $$ y_i=\beta_i x_i+\varepsilon_i $$ where $\varepsilon_i\sim \text{iid }\mathcal{N}\left(0,\sigma^2\right)$. Note that the $\beta_i$ coefficients can depend on $i$.

I observe $n$ observations of $y_i$ and $x_i$ in the data but not $\beta_i$. My question is: if I use standard OLS on this data, what do I get? I suspect that the OLS coefficient is something like $\sum_{i=1}^n \beta_i/n$. Is that correct? Thanks!

Well, yes if I tried to estimate each $\beta_i$. Then I could just set $\hat{\beta}_i=y_i/x_i$ but that's not what I'm doing. I want to compute the standard OLS estimator $\hat{\beta}=(X'X)^{-1}X'Y$. I'm wondering what that estimator would capture here. — user_lambda, Aug 02 '20 at 14:04
See mixed models and random coefficients. Offcourse this involves certain assumptions on how to model the dependence of beta on i. — Jesper for President, Aug 02 '20 at 14:43
That makes it easy, then: you are telling us the model matrix $X$ is the diagonal matrix with entries $x_i$ on the diagonal, whence $\hat\beta_i=y_i/x_i$ provided $x_i\ne 0.$ — whuber, Aug 02 '20 at 19:53

EdM · Accepted Answer · 2020-08-03T00:29:16.190

For the model as you have written it (no intercept) the Wikipedia page provides the formula for the slope estimate $\hat\beta$:

$$\widehat{\beta} = \frac{ \sum_{i=1}^n x_i y_i }{ \sum_{i=1}^n x_i^2 } = \frac{\overline{x y}}{\overline{x^2}} $$

So if the slopes differ among the cases observed, the estimate won't be a simple average of the individual slopes $y_i/x_i$. If all observations were taken at the same value of $x$ then your formula would hold. Otherwise the reported slope estimate will depend on how the $x$ values were distributed among the cases with different slopes. For example, if your first case had a slope of 1 observed for $x=1$ and thus $(x,y)$ = (1,1) while your second case had a slope of 2 observed for $x=2$ and thus values (2,4), the formula provides $\hat\beta = 9/5$, not the slope average of 1.5.

We can see this more generally if we write each slope $\beta_i$ as the sum of a mean slope $\beta_0$ and a case-specific deviation $\delta_i$ from that mean slope, similar to the way that a random-slope mixed model would (see below). Then $y_i = (\beta_0 + \delta_i)x_i$ and the above formula becomes:

$$\widehat{\beta} = \frac{ \sum_{i=1}^n x_i^2 (\beta_0 +\delta_i)}{ \sum_{i=1}^n x_i^2 } = \beta_0 +\frac{\overline{x^2 \delta}}{\overline{x^2}} .$$

In general, if you expect the slopes to differ among individuals then your model should incorporate that explicitly, and the intercept should not be omitted unless there is a theoretical reason why the intercept must be 0. So in this type of situation in practice you should consider a different approach.

If there are just a few individuals then you could do that with a factor variable ID representing the individuals and an interaction between the individuals and the predictor x, x:ID written in R as:

lm(y ~ x + ID + x:ID)

This model implicitly include an intercept, With standard treatment coding of the predictors (the default in R) the intercept is the value of y when both x=0 and the ID is that of the individual designated as the reference. The coefficient for x is the slope for that same reference individual. The coefficients for the other ID values are differences of the corresponding intercepts from that of the reference, and the coefficients for the interaction terms are the corresponding differences in slopes. You can compare models with and without the interaction term to test your hypothesis that the slopes differ among individuals.

With more than a few individuals this is better done with a mixed model, in which the intercept and slope represent types of overall averages and the individual-specific intercepts and slopes are modeled with Gaussian distributions around those averages. That is called a linear mixed model, written in this case in R as:

lmer(y ~ x + (x|ID))

The overall intercept and the individual-specific intercepts are implicit in this form. This formula also implicitly assumes a correlation between individual intercepts and slopes, which is estimated. See this page and its many links for further information.

@SextusEmpiricus I admit that I assumed there were sufficient numbers of observations for each individual over a range of `x` values to estimate slopes and intercepts. I had thought that was implicit in the question but I might have been mistaken. — EdM, Aug 02 '20 at 15:56
@SextusEmpiricus thanks. I have now answered the question more directly for the situation with single measurements per case. — EdM, Aug 02 '20 at 17:07

OLS when $\beta_i$ varies across observations

1 Answers1