3

I have some data, and I assume it can be modelled by $y_i = f(x_i) + \epsilon $, where $\epsilon \sim \mathcal{N}(0,\sigma_0^2)$ where $f$ and $\sigma_0^2$ are unknown. I understand that I can estimate the variance of the noise by calculating the mean squared error from the fitted model.

Is there a way to investigate the variance locally around a point? For example, if I instead say that $\sigma_0^2$ is a function of $x_i$. How can I estimate the function $\sigma_0^2(x)$?

user112495
  • 297
  • 1
  • 9
  • Why does *local* polynomial regression assume a constant variance? I thought the whole point of local regression was to get rid of the "statistical model" approach and just use this approach locally. In LOESS, e.g., at each x only the k nearest neighbor points are considered, and the variance is thus by definition only computed in a neighborhood of x and tehreby varies with x. – cdalitz Feb 03 '22 at 13:33

1 Answers1

1

As you noticed, model like $y_i = f(x_i) + \epsilon$ assumes that the noise is independent and identically distributed, so the variance is the same for all observations. If this isn't the case, you need a model that accounts for that. How can you estimate the functional form of variance? The answer depends on what kind of model you choose for it. $f(x_i)$ can have an unlimited number of forms and way of estimating it, the same applies to $\sigma^2_0(x_i)$.

I guess that you don't have any idea of the functional form of the variance that could be appropriate, otherwise, you wouldn't ask the question, at least not the broad one. If you don't, you probably want a flexible nonparametric model. A popular one that does exactly this is Gaussian process, where you estimate both the mean function $m(\mathbf{x})$ and covariance function ${k(\mathbf{x}, \mathbf{x}')}$ to model the distribution over functions as Gaussian process

$$ f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), \;k(\mathbf{x}, \mathbf{x}')) $$

If your model was polynomial regression, than the model assumed constant variance. If the variance is non-constant, you need a different model accounting for that. As about calculating it from residuals, for pointwise variance you would have only single point to calculate each variance, so it cannot be calculated from the residuals.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Sorry, I should have clarified - I'm currently using a local polynomial regression to estimate the function $f$. So ideally I'd like to be able to estimate the variance locally using this framework. I've updated my post to clarify this. But you're right, I have no intuition for what form $f$ or $\sigma_0^2$ should take. This isn't an issue for $f$ using local polynomial regression, but I'm not sure how to do something similar for variance. – user112495 Feb 03 '22 at 09:14
  • 1
    @user112495 see my edit. If you used polynomial regression, than you assumed constant variance. If this assumption is wrong, you used incorrect model for the data. – Tim Feb 03 '22 at 09:47