I have biological time series (9 years long) of the biomass of species which logically exhibit a seasonal pattern. I would like to cluster them into a few groups based on their typical seasonal evolution (e.g. spring vs. summer species).
To do so, I was advised to use Fourier transform in order to decompose their signal into N
harmonics (e.g. 3: annual, bi-annual and tri-annual seasonal cycles) and use the amplitudes and phases of these in a Principal Components Analysis (PCA; which would work as the harmonics are orthogonal/uncorrelated).
I know there are already some similar subjects in this Forum, yet some aspects remain unclear to me. My questions are:
(1) When I reconstruct the time evolution from the N
first harmonics computed from the Discrete Fourier Transform (DFT), the explained variability of the original signal (the R² of the linear model between recomposed signal and the original data) is sometimes only 0.40 (N=3
) or 0.60 (N=5
). In your experience, does it mean the data are not suited for this approach, does that invalidate the approach? Is there more pre-processing I could do to fix that (e.g., smoothing the signals, …)? Some species exhibit sudden increases spaced by total absence, and I wonder if this doesn’t call for the need of higher frequency harmonics; should I expect difficulties there and how to tackle them?
(2) Beside DFT which appears limited here, I considered using continuous Fourier Transform through a Fast Fourier Transform (FFT) algorithm and working on the power spectrum of each time series. I wonder if this could allow me to select N'
so-called “harmonics” by selecting the N'
highest peaks in the periodogram and then calculating the corresponding amplitude and phase to be used in a following PCA... Does that make sense?
How to concretely use the info given by a FFT algorithm in R (such as fft()
or spec.pgram()
) in order to run a subsequent PCA (or any other clustering method)? [any R code snippet would be very welcome]
(3) How to reconstruct the signal from selected harmonics in the continuous case (FFT)? I can easily do this in the DFT case, but I am stupidly blocked in the continuous case… Any R code snippet is of course very welcome.
Any help regarding these questions would be very appreciated. Links toward concrete examples, especially with associated R code, would be very helpful too (as well as method name or keywords). Thank you.
PS: in case it is useful: The time series are of equal length and pre-processed to have uniform sampling intervals; stationarity may be assumed; no long-term trend is in the way. I divided the time series in 52 equally-spaced observations per year (i.e., 468 observations over the 9 years).