Do these residual plots indicate that my least squares regression coefficient estimates may be biased?

Question

Lets say I have a linear regression: $$y \sim 1 + x_1+x_2$$

where the range of $x_2$ is $[0,10]$. I fit this model using lm or rlm with regression weights in R. I collect the residuals and plot them against $x_1$, I found that the residuals show a pattern with respect to the variable $x_1$. The $R^2$ of regressing the residuals onto $x_1$ is $20\%$. Is that possible? What could be the causes?

After the same linear regression as above, if I take a smaller portion of the data, say all the data with $x_2<6$. Then I collect the residuals and $x_1$ of this subset and plot the subsetted residuals against the subsetted $x_1$. I found that the residuals still show a pattern with respect to the variable $x_1$.

(The two 20% above are just for example... they are not related ... and maybe there is a theory saying that one should be definitely larger than the other, etc. )

Is that possible? What could be the causes?

Edit: Let me try to describe the shape of the pattern.

Lets say the range of $x_1$ is $[0, 100]$.

At around $x_1=1$, the residuals are in a vertical band of $[-0.1, 0.1]$.
At around $x_1=10$, the residuals are in a vertical band of $[-1, 1]$.
...
At around $x_1=100$, the residuals are in a vertical band of $[-10, 10]$.

I intentionally put these numbers so you see the upper-band and lower-band are growing somewhat linearly as $x_1$ increases. I know this is heteroskedasiticity. But I guess since I am concerned about "bias", not inference... so I don't worry about the heteroskedasiticity...

What does the pattern look like? Usually a pattern is $x_i$ vs. $\varepsilon_i$ indicates that there are non-linear effects of $x_i$ not subsumed by the fitted model. Sometimes this can be remedied by inserting polynominal terms in $x_i$ or some other transformed version of $x_i$. Also, why did you use the term bias in the title? — Macro, Jul 25 '12 at 21:14
@Macro I suppose that Luna called it bias because if there is truly a remaining systematic effect like the polynomial term in x$_i$ that you posited the model estimates of y could be biased estimates of the "true" y at least at some points in the x space. — Michael R. Chernick, Jul 25 '12 at 21:25
Note that [my answer to your question here](http://stats.stackexchange.com/questions/32471/how-can-you-handle-unstable-beta-estimates-in-linear-regression-with-high-mul) shows that the residuals _must_ be uncorrelated with your predictor variables if you've fit a least squares regression (with the intercept). Therefore, the $R^2$ can't be $20\%$ - it must be $0$. There could be a non-linear relationship between $x_i$ and $\varepsilon_i$ though, so I'll wait to hear what that relationship looks like. — Macro, Jul 25 '12 at 21:26
@Luna, it sounds like you're describing a "horn" shape, indicating heteroskedasticity. If this is a linear model, then I don't think the heteroskedasticity will bias your estimates but it will affect your inference (i.e. $p$-values and confidence intervals) so you'll want to take care of it if you plan to do any inference. You may want to consider generalized least squares (the `gls` function in `R`), which is a common remedy for heteroskedasticity. — Macro, Jul 25 '12 at 21:48
@Luna, can you make a plot of the residuals vs. $x_1$ (or whichever exactly is the actual problem) & post it in your question? The 6th button from the left (that looks like a picture of a blue sky) when you're editing will open a wizard & let you upload a png file from your machine. That will help us understand the problem you're having. — gung - Reinstate Monica, Jul 25 '12 at 22:59

score 7 · Answer 1 · answered Jul 26 '12 at 13:29

What you've described are heteroscedastic errors and regarding your question about bias:

Heteroscedasticity does not bias least squares estimators of regression coefficients

Suppose you have a response variable $Y_i$ and and $p$-length vector of predictors ${\bf X}_{i}$ such that

$$ Y_i = {\bf X}_i {\boldsymbol \beta} + \varepsilon_i $$

where ${\boldsymbol \beta} = \{ \beta_0, ..., \beta_p \}$ is the vector of regression coefficients and the errors, $\varepsilon_i$ are such that $E(\varepsilon_i)=0$ with no restrictions on the variance except that it is finite for each $i$. Then the least squares estimator of ${\boldsymbol \beta}$ is

$$ \hat {\boldsymbol \beta} = ( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} {\bf Y} $$

Where $$ {\bf X} = \left( \begin{array}{c} {\bf X}_1 \\ {\bf X}_2 \\ \vdots \\ {\bf X}_n \\ \end{array} \right) $$

is a matrix where the rows are the predictor vectors for each individual, including $1$s for the intercept and ${\bf Y}$, ${\boldsymbol \varepsilon}$ are similarly defined as the vector of response values and errors, respectively.

Regarding the expected value of $\hat {\boldsymbol \beta}$, it helps to replace ${\bf Y}$ with $({\bf X} {\boldsymbol \beta} + {\boldsymbol \varepsilon})$ to get that

$$ \hat {\boldsymbol \beta} = ( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} ({\bf X} {\boldsymbol \beta} + {\boldsymbol \varepsilon}) = \underbrace{( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} {\bf X} {\boldsymbol \beta}}_{= {\boldsymbol \beta}} + ( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} {\boldsymbol \varepsilon} $$

Therefore, $E(\hat {\boldsymbol \beta}) = {\boldsymbol \beta} + E \left( ( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} {\boldsymbol \varepsilon} \right ) $, so we just need the right hand term to be 0. We can derive this by conditioning on ${\bf X}$ and averaging over ${\bf X}$ using the law of total expectation:

\begin{align*} E \left( ( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} {\boldsymbol \varepsilon} \right ) &= E_{ {\bf X} } \left( E \left( ( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} {\boldsymbol \varepsilon} \right | {\bf X}) \right) \\ & = E_{ {\bf X} } \left( {\bf X}^{{\rm T}} {\bf X} )^{-1} {\bf X}^{{\rm T}} E ( {\boldsymbol \varepsilon} | {\bf X} ) \right) \\ &= 0 \end{align*}

where the final line follows from the fact that $E( {\boldsymbol \varepsilon} | {\bf X} )=0$, the so-called strict exogeneity assumption of linear regression. Nothing here has relied on homoscedastic errors.

Note: While heteroscedasticity does not bias the parameter estimates, useful results including the Gauss-Markov Theorem and the covariance matrix of the $\hat {\boldsymbol \beta}$ being given by $\sigma^2 ({\bf X}^{\rm T} {\bf X})^{-1}$ do require homoscedasticity.

Thanks Macro. But (1) I am not sure we are talking about the same thing. My question was about "The R^2 of regressing the residuals onto x 1 is 20% ." (2) In my question, I had weights in regression. — Luna, Jul 27 '12 at 15:00
Hi @Luna, two things: (1) As I showed you in [my answer here](http://stats.stackexchange.com/questions/32471/how-can-you-handle-unstable-beta-estimates-in-linear-regression-with-high-mul), and commented above, each predictor will have $0$ correlation (and therefore $R^2=0$) with the residuals. (2) The residual plot you described is a textbook description of heteroskedasticity, not correlation. If that's not the case, then perhaps you can include a plot, as I and others have asked for. Perhaps you can tell me what your real question is or edit your question appropriately. — Macro, Jul 27 '12 at 15:14
Thanks Macro. (1) But with weights, the R^2=0 won't hold any more, right? (2) I won't be able to show a plot; but it has both the heteroskedasticity and the linear slope with leads to R^2=20% ... — Luna, Jul 27 '12 at 19:05

Do these residual plots indicate that my least squares regression coefficient estimates may be biased?

1 Answers1

Linked