1

It's somewhat related to a question I posed earlier Different usage of the term "Bias" in stats/machine learning regarding the various usages of "bias."

I was the following questions a couple of months ago:

In simple linear regression, we have $Y = WX + b + e$ where $e$ is the standard normal error. What affect does $e$ have on the bias of the model? What if instead we have $Y = W(X + e) + b$?

When I was asked this question, I assumed they were asking about "bias" in terms of "bias"-variance tradeoff.

I know that the bias is an estimate is defined as

$$ \left( E[\hat{f}(x)] - f(x) \right)^2 $$

where $f$ is the true unobserved model and $\hat{f}$ is the model derived using linear regression. But $f$ and $\hat{f}$ aren't affected by the unobserved error $e$, so it seems in both situations, the unobserved error has no affect on the bias?

student010101
  • 334
  • 2
  • 10

1 Answers1

1

$\newcommand{\e}{\varepsilon}\newcommand{\E}{\operatorname E}$In general $\hat f$ is a function of the data $X$ and $y$, and $y$ depends on $X$ and the error $\e$, so $\hat f$ itself has a functional dependency on $\e$.

In the usual setting of linear regression we have $y = X\beta + \e$ and $$ \hat \beta = (X^TX)^{-1}X^Ty \\ = (X^TX)^{-1}X^T(X\beta + \e) \\ = \beta + (X^TX)^{-1}X^T\e $$ so we can see exactly how they relate. For a particular point $x$ we'll have $$ \hat f(x) = x^T\hat\beta = x^T\beta + x^T(X^TX)^{-1}X^T\e $$ so if we make the usual assumptions of $\E[\e] = \mathbf 0$ and $\operatorname{Var}[\e] = \sigma^2 I$ then $\hat f(x)$ is a random variable centered at $x^T\beta$, so it is unbiased, and with a variance given by $$ \operatorname{Var}[\hat f(x)] = x^T(X^TX)^{-1}X^T\operatorname{Var}[\e]X(X^TX)^{-1}x = \sigma^2 x^T(X^TX)^{-1}x. $$ This shows how the uncertainty in our prediction for a particular point depends both on how that point $x$ relates to $X$ (the variance will be large when $x$ is mostly in the span of the bottom eigenvectors of the sample covariance matrix $X^TX$) and the underlying variance $\sigma^2$, which simply has a scaling effect here.

After taking expected values $\e$ no longer appears, but that's because the expected value is precisely integrating that out.

jld
  • 18,405
  • 2
  • 52
  • 65
  • Right, so I followed all this, so in a nutshell the unobserved error has no affect on $\hat{f}(x)$? – student010101 Mar 02 '21 at 00:53
  • @student010101 no, the unobserved error is part of $\hat f$. Since $\hat f(x) = x^T\beta + x^T(X^TX)^{-1}X^T\varepsilon$ then changes in the unobserved error $\varepsilon$ will change $\hat f(x)$ (unless the changes are only along directions in the null space of $X^T$, but that's probability zero if $\varepsilon$ is continuous) – jld Mar 02 '21 at 15:24
  • Oh I see. What about for the second question where $Y = W(X+ \epsilon) + b$? I'm actually a bit confused by this second question, but it seems in the first one $X$ is fixed (which I think we typically assume to be the case), but the second case assumes $X$ is subject to error? But how does that affect $\hat{f}$? – student010101 Mar 02 '21 at 15:49
  • @student010101 I'd just distribute $W$ to get $Y = WX + b + W\varepsilon$ so we're still in the same setting as before, just with an error variance of $W^2\sigma^2$. In general if we end up with a linear transformation of the error then the mean of the error is still zero, so we stay unbiased, but we can end up with $\text {Var}[\varepsilon]$ being a general positive semidefinite matrix and this can lead to inefficiency in our estimates unless we use generalized least squares or other things like that – jld Mar 02 '21 at 16:13
  • The estimator is "unbiased," but I think the question is asking for how the unobserved error affects the "bias" in the bias-variance tradeoff, which is a different thing right? Or am I misunderstanding? – student010101 Mar 02 '21 at 17:52
  • @student010101 hmm maybe we're talking about different things. What I'm saying is (1) the randomness in $\hat f(x)$ comes from $\varepsilon$ which is an intrinsic part of $\hat f$; (2) since $\text E[\varepsilon] = \mathbf 0$ we have $\text E[\hat f(x)] = f(x)$ so our estimator is unbiased; (3) $\text{Var}[\varepsilon]$ appears in the variance of $\hat f$ so the unobserved error does affect the uncertainty in our predictions, even in the population sense (like in expectation, not just in a finite sample) – jld Mar 02 '21 at 18:34