Smoothing - when to use it and when not to?

Question

There is quite an old post on William Briggs' blog which looks at the pitfalls of smoothing data and carrying that smoothed data through to analysis. The key argument is namely:

If, in a moment of insanity, you do smooth time series data and you do use it as input to other analyses, you dramatically increase the probability of fooling yourself! This is because smoothing induces spurious signals—signals that look real to other analytical methods. No matter what you will be too certain of your final results!

However, I am struggling to find comprehensive discussions as to when to smooth and when not to.

Is it only frowned upon to smooth when using that smoothed data as an input to other analysis or are there other situations when smoothing is not advised? Conversely, are there situations where smoothing is advised?

Most applications of time series analysis are some kind of smoothing, even when not described as such. Smoothing can be used as an exploratory or summary device -- in some fields, that is even the main or only used method -- or for removing features than are regarded as a nuisance or of secondary interest for some purpose. — Nick Cox, Mar 30 '15 at 09:22
Disclaimer: I have not read the entire blog post cited. I could not get past the elementary typos ("times series", "Monte Carol") and its tone and style were not attractive. But I would not advise trying to learn the principles of time series analysis, or statistics generally, via anybody's blog. — Nick Cox, Mar 30 '15 at 09:26
@NickCox Agreed, and especially not from a blog that appears to have an axe to grind. — Hong Ooi, Mar 30 '15 at 09:37
@HongOoi Yes! I deleted some choice phrases from a draft of my comment that might have seemed no less opinionated than the blog itself. — Nick Cox, Mar 30 '15 at 09:49
@NickCox: I fully agree that Briggs' blog posts are confrontational in tone. However, I think his point is valid. I edited my answer to illustrate it and would be interested in any comments you might have. — Stephan Kolassa, Mar 30 '15 at 09:53
@StephanKolassa I am glad we agree on Briggs's blog, which is naturally tangential here. I naturally agree on your key technical point which I take to be standard. — Nick Cox, Mar 30 '15 at 10:09
I'd take everything that Briggs writes with a grain of salt. — Momo, Mar 30 '15 at 10:28
For anyone who questions Briggs because of his writing style, I suggest you review another of his posts. http://wmbriggs.com/post/86/ The points he make about how correlations increase with smoothing are so simple and understandable that you can follow him to conclusion without lifting pencil to paper. — Joseph Wein, Nov 27 '17 at 14:47
Sure; and it can be made simpler. Smooth a series into the mean of the first half and the mean of the second half. Now your summary defines a perfect correlation. Worth pointing out, but the thread is about a broader claim which if taken the wrong way deprives researchers of most machinery for time series analysis. — Nick Cox, Nov 27 '17 at 15:06
I sampled a little more of Briggs' blog. Everyone should try it and you'll know quickly if you find it helpful. or even entertaining. . — Nick Cox, Nov 27 '17 at 16:08

score 16 · Accepted Answer · edited Apr 13 '17 at 12:44

Exponential Smoothing is a classic technique used in noncausal time series forecasting. As long as you only use it in straightforward forecasting and don't use in-sample smoothed fits as an input to another data mining or statistical algorithm, Briggs' critique does not apply. (Accordingly, I am skeptical about using it "to produce smoothed data for presentation", as Wikipedia says - this may well be misleading, by hiding the smoothed-away variability.)

Here is a textbook introduction to Exponential Smoothing.

And here is a (10-year-old, but still relevant) review article.

EDIT: there seems to be some doubt about the validity of Briggs' critique, possibly somewhat influenced by its packaging. I fully agree that Briggs' tone can be abrasive. However, I'd like to illustrate why I think he has a point.

Below, I'm simulating 10,000 pairs of time series, of 100 observations each. All series are white noise, with no correlation whatsoever. So running a standard correlation test should yield p values that are uniformly distributed on [0,1]. As it does (histogram on the left below).

However, suppose we first smooth each series and apply the correlation test to the smoothed data. Something surprising appears: since we have removed a lot of variability from the data, we get p values that are far too small. Our correlation test is heavily biased. So we will be too certain of any association between the original series, which is what Briggs is saying.

The question really hangs on whether we use the smoothed data for forecasting, in which case smoothing is valid, or whether we include it as an input in some analytical algorithm, in which case removing variability will simulate higher certainty in our data than is warranted. This unwarranted certainty in input data carries through to end results and needs to be accounted for, otherwise all inferences will be too certain. (And of course we will also get too small prediction intervals if we use a model based on "inflated certainty" for forecasting.)

n.series <- 1e4
n.time <- 1e2

p.corr <- p.corr.smoothed <- rep(NA,n.series)
set.seed(1)
for ( ii in 1:n.series ) {
    A <- rnorm(n.time)
    B <- rnorm(n.time)
    p.corr[ii] <- cor.test(A,B)$p.value
 p.corr.smoothed[ii] <- cor.test(lowess(A)$y,lowess(B)$y)$p.value
}

par(mfrow=c(1,2))
hist(p.corr,col="grey",xlab="",main="p values\nunsmoothed data")
hist(p.corr.smoothed,col="grey",xlab="",main="p values\nsmoothed data")

p values

I'd take it as axiomatic for good time series analysis that no smooth is shown without the raw data being shown too. — Nick Cox, Mar 30 '15 at 09:50

score 1 · Answer 2 · answered Dec 02 '16 at 05:04

Claiming that smoothing is inappropriate for a modeling analysis condemns it to having higher mean square error than it otherwise might. Mean square error or MSE can be decomposed into three terms, a square of a value called ``bias'', a variance, and some irreducible error. (This is shown in the citations below.) Excessively smoothed models have a high bias, even if they have low variance, and too rough models have high variances, and low bias.

There's nothing philosophical about this at all. It is a mathematical characterization. It does not depend upon the character of the noise or the character of the system.

See:

http://scott.fortmann-roe.com/docs/BiasVariance.html

https://galton.uchicago.edu/~lafferty/pdf/nonparam.pdf

http://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture4/BiasVariance.pdf (This has the derivation of the decomposition.)

http://www.cs.columbia.edu/~blei/fogm/2015F/notes/regularized-regression.pdf (Blei does the same in a different way, and brings in what happens when one tries to predict.)

Classical statistics almost always insisted upon unbiased estimates. In 1955, statistician Charles Stein of Stanford showed that there were combinations of unbiased estimators which had lower MSE for important special cases, notably what became called the JAMES-STEIN ESTIMATORS. Bradley Efron wrote a very approachable text about this revolution in insight: http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf

Smoothing - when to use it and when not to?

2 Answers2

Linked