9

Typically, local regression (loess / lowess) is used to create smooth plots.

Assuming the points are equidistant along the X axis, what's the advantage of using local regression compared to a simple moving average with an appropriate window size (which is obviously much faster)?

max
  • 1,254
  • 1
  • 12
  • 29

2 Answers2

15

A simple smoothing average can be interpreted as a local linear regression with a rectangular kernel. A rectangular kernel assigns equal weights (read importance) to each point falling within its kernel support (read window). If you think this assumption encapsulates your modelling assumptions adequately then you have no reason not to pick a simple moving average for smoothing. If you think this assumption is a bit oversimplifying... read along.

Let's assume that we look at data $(y_i,t_i)$ but actually what is going on is that $y_i = y_{\text{true}}(t_i) + \epsilon_i$ where $y_{\text{true}}$ has a some odd but smooth parametric form and $\epsilon \sim N(0,\sigma^2_{\epsilon})$. By smoothing we try to estimate $y_{\text{true}}$.

We could go ahead and fit a model across all data; something like: $y = \beta_0 + \beta_1 t + \epsilon$ (or a higher degree polynomial) but we suspect that this is too restrictive. We have the implicit understanding that data close to a time-point $t$ are more relevant to the value $y_{\text{true}}(t)$ than data further way from $t$. So we decide to built a window around $t$, say $[t-b, t+b]$ where $b$ is a bandwidth. Now, if the assumption is that all points within $[t-b, t+b]$ are equally important to estimate $y_{\text{true}}(t)$ then a rectangular kernel where all point are weighted the same is perfect for us. But maybe we think "... within the window some central points matter more" and we try another kernel) (eg. triangular or Epanechnikov) that assigns higher importance to central points. Or actually we are not really certain about the assumption of a window to begin with so we fit try a kernel (eg. Gaussian) that has infinite support. ($b$ is always to be estimated using cross-validation). Local linear regression gives the ability to test and actually incorporate all these assumptions to our final estimates for $y_{\text{true}}$.

Finally let me point out that "lowess/loess" are utilising locally weighted linear regression to smooth data but their are just one type of the local polynomial methods (eg. the Nadaraya–Watson estimator, one of the earliest estimators of this kind) used in semi-parametric regression. Other models (eg. roughness penalty methods, like spline smoothing) are also available; see A.C. Davison Statistical Models, Chapt. 10.7 for a nice concise introduction.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • What's the pros/cons of spline smoothing vs. LOWESS/LOESS for the purposes of scatter plot smoothing and prediction? Or should I ask another question on that? – max Apr 03 '16 at 07:09
  • In extremely short: spline smoothing guarantees certain smoothness conditions, has more severe edge effects and is usually cheaper computationally than kernel smoothing. Having said that: yes, I think this question will make an interesting new question on its own right. – usεr11852 Apr 03 '16 at 07:17
  • Asked the question [here](http://stats.stackexchange.com/questions/205216/locally-weighted-regression-vs-splines). Also, the zero-degree rectangular kernel regression = moving average causes predictions near either edge of the dataset to be stupidly biased whenever the true relationship is materially sloped. Did you mean splines can make things even worse than that?! – max Apr 03 '16 at 08:12
  • Cool, best of luck getting a nice answer. Yeah, especially for irregularly sampled data splines can really diverge near the edges. The horrible truth is that we do not have a technique that performs really well towards the edge.Wavelet smoothing might do the trick but even then the task is a bit ill-posed. – usεr11852 Apr 03 '16 at 16:44
4

Moving average is what you get when you are Using a zero degree polynomial [which] turns LOESS into a weighted moving average. Higher degrees yield different answers.

Laurent Duval
  • 2,077
  • 1
  • 20
  • 33
  • 3
    The Wikipedia article could easily be misread as implying that lowess/loess is synonymous with local regression. That would be to confuse one particular implementation with the general idea. – Nick Cox Apr 02 '16 at 23:48