23

We sometimes use spectral density plot to analyze periodicity in time series. Normally we analyze the plot by visual inspection and then try to draw a conclusion about the periodicity. But have the statisticians developed any test to check whether any spikes in the plot are statistically different from white noise? Have the R-experts developed any package for spectral density analysis and for doing that kind of test?

einar
  • 3,258
  • 2
  • 19
  • 31
Pantera
  • 508
  • 1
  • 5
  • 13
  • 2
    Pressed by @Wesley, I deleted my quick thoughts about autocorrelation functions and periodogram (may be he indeed is a frequency domain analysis guru, but I personally don't think Bartlett, while working with autocorrelations in time domain), but still think that my second suggestion about `bootspecdens` may be helpful. – Dmitrij Celov Jun 22 '11 at 23:09
  • I'm basing my assumption about people's response to 'what is an autocorrelation?' on literature appearances, in which almost all instances where an autocorrelation is used being of the standard, time-domain computed, Barlett autocorrelation. And, unfortunately, this is bad! :) I do appreciate the suggestion of `bootspecdens` from Dmitrij; looking forward to checking it out. – Wesley Burr Jun 23 '11 at 00:21

4 Answers4

10

You should be aware that estimating power spectra using a periodogram is not recommended, and in fact has been bad practice since ~ 1896. It is an inconsistent estimator for anything less than millions of data samples (and even then ...), and generally biased. The exact same thing applies to using standard estimates of autocorrelations (i.e. Bartlett), as they are Fourier transform pairs. Provided you are using a consistent estimator, there are some options available to you.

The best of these is a multiple window (or taper) estimate of the power spectra. In this case, by using the coefficients of each window at a frequency of interest, you can compute a Harmonic F Statistic against a null hypothesis of white noise. This is an excellent tool for detection of line components in noise, and is highly recommended. It is the default choice in the signal-processing community for detection of periodicities in noise under assumption of stationarity.

You can access both the multitaper method of spectrum estimation and the associated F-test via the multitaper package in R (available via CRAN). The documentation that comes with the package should be enough to get you going; the F-test is a simple option in the function call for spec.mtm.

The original reference that defines both of these techniques and gives the algorithms for them is Spectrum Estimation and Harmonic Analysis, D.J. Thomson, Proceedings of the IEEE, vol. 70, pg. 1055-1096, 1982.

Here is an example using the included data set with the multitaper package.

require(multitaper);
data(willamette);
resSpec <- spec.mtm(willamette, k=10, nw=5.0, nFFT = "default",
                    centreWithSlepians = TRUE, Ftest = TRUE,
                    jackknife = FALSE, maxAdaptiveIterations = 100,
                    plot = TRUE, na.action = na.fail) 

The parameters you should be aware of are k and nw: these are the number of windows (set to 10 above) and the time-bandwidth product (5.0 above). You can easily leave these at these quasi-default values for most applications. The centreWithSlepians command removes a robust estimate of the mean of the time series using a projection onto Slepian windows -- this is also recommended, as leaving the mean in produces a lot of power at the low frequencies.

I would also recommend plotting the spectrum output from 'spec.mtm' on a log scale, as it cleans things up significantly. If you need more information, just post and I'm happy to provide it.

Wesley Burr
  • 756
  • 3
  • 10
  • To Burr, Silva and Celov - thanks a lot for your interesting answers and suggestions. I look forward to test these estimators. Best regards – Pantera Jun 22 '11 at 21:53
  • (+1) this night I thought carefully about your suggestions, and decided that time domain indeed is the last thing (due to lag truncation and weak properties in small samples) to try searching for cycling behavior. What I am personally concerned about are the assumptions for F statistics and the small sample size properties of the suggested scheme. Well and probably it is good to start a separate question regarding the optimal window selection, because there are indeed many. – Dmitrij Celov Jun 23 '11 at 07:44
  • There are indeed many window choices, although the two most common are the Discrete Prolate Spheroidal Sequences (or *Slepians*) and the sine tapers. If you are looking for maximal concentration of energy in a local bandwidth, the Slepians have been proved to be optimal, and in fact are the output from the integral equation form of the spectral density (see the paper I mentioned for full details). As far as the F statistics go, there certainly are some issues with degrees of freedom, but on the whole they work quite well, with ~ 2k-2 dof available. – Wesley Burr Jun 23 '11 at 19:56
  • Smoothed periodogram also uses taper, allows for FFT, David Stoffer's book teaches you how to calculate confidence intervals as well. This `multitaper` package seems to have employed more advanced techniques for tapering and calculating confidence interval. But I think the idea was the same, according to David Stoffer. This is the only thing I could think of that teaching vanilla peridogoram actually still makes sense today. – stucash Aug 05 '19 at 21:52
  • ok so you are one of the authors of this package and you have used some very strong words against periodogram I hope you could one day come back with more evidence. Common pros and cons for Periodogram are well known, like its explosive variance which is why it's not a good consistent estimator for spectrum but the smoothed periodogram isn't really that bad, not as bad as you stated here I think. – stucash Aug 06 '19 at 20:45
  • Admittedly, this was 8 years ago, when I was a lot younger and crankier. But for scientific applications with many lines embedded against a quasi-stationary background, the periodogram and the smoothed versions remain less than useless. And that's the primary area I work in. If you have a handful of lines, you can do whatever you want, and it'll basically work. Provided your model matches reality, which is always questionable. Even with smoothing, the periodogram is not robust to violations. – Wesley Burr Sep 06 '19 at 15:38
3

Ronald Fisher proposed an exact test of the maximum periodogram coordinate in R.A. Fisher, Proc. R. Soc A (1929) 125:54. The test is based on the g-statistic. Specifically, the null hypothesis of Gaussian white noise is rejected if g is significantly large, that is, if one of the values ​​of $f(\omega_k)$ is significantly higher than the average of all signals.

You can get more details about the test in MB Priestley, Spectral Analysis and Time Series, Academic Press, London, 1981, page 406.

In R, the package GeneCycle contains the function fisher.g.test():

library(GeneCycle)
?fisher.g.test

Hope this helps.

chl
  • 50,972
  • 18
  • 205
  • 364
  • this is great but the package's g test relies on its own periodogram function which has very limited options for calculating power spectra... – stucash Aug 05 '19 at 21:30
3

We have tried an attempt to address this issue by a wavelet transform of a spectral-based test recently in this paper. Essentially, you need to consider the periodogram ordinates distribution, similarly to the article of Fisher, mentioned in the earlier answers. Another paper from Koen is this. We have recently published an R package hwwntest.

Delyan Savchev
  • 341
  • 1
  • 5
1

Use the spectrum.test function in the ts.extend package

You can conduct a "permutation spectrum test" on your data using the ts.extend package. This is a permutation-based variant of the classic Fisher test that looks at the maximum spectral intensity of the data and compares it to its null distribution under the null hypothesis of exchangeability. (The advantage of this test over the Fisher test is that it does not assume normal error terms in the data; i.e., it works for any underlying distribuiton.) Here is an example where we generate data with a periodic signal and then test for the presence of the signal. The test output and resulting plot easily detects the signal.

#Load the package
library(ts.extend)

#Generate mock data
set.seed(1)
m      <- 100
SIGNAL <- 0.8*sin(0.3*(1:m))
NOISE  <- rnorm(m)
SERIES <- SIGNAL + NOISE

#Conduct permutation-spectrum test
TEST <- spectrum.test(SERIES)
TEST
        Permutation-Spectrum Test

data:  real time-series vector SERIES with 100 values
maximum scaled intensity = 3.6428, p-value = 0.000208
alternative hypothesis: distribution of time-series vector is not exchangeable 
(at least one periodic signal is present)

#Plot the test results
plot(TEST)

enter image description here

Ben
  • 91,027
  • 3
  • 150
  • 376