3

I just started my course in time series analysis. I saw a statement there: "Statistical methods that depend on independence assumption are no longer applicable in time series analysis". Why it so?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
StatsMonkey
  • 150
  • 1
  • 7
  • 5
    The present does depend on, and the future will depend on. whatever is past. Otherwise yesterday's temperature would have no possible bearing on today's and this year's sales would have no possible bearing on next year's sales. Independent noise does have a part to play in time series analysis, however. – Nick Cox Apr 26 '20 at 06:40
  • @NickCox: do you want to post your comment(s) as an answer? [Better to have a short answer than no answer at all.](https://stats.meta.stackexchange.com/a/5326/1352) Anyone who has a better answer can post it. – Stephan Kolassa Apr 26 '20 at 06:41
  • @StephanKolassa Thanks for the encouragement. (I was surprised not to find a duplicate of this, but I may also have missed it.) – Nick Cox Apr 26 '20 at 09:42
  • 2
    As suggested by the remark at the end of @Nick's first comment, the quoted statement is not generally true. For instance, ordinary regression methods that assume independent errors have many applications in the analysis of time series data. This exposes the implicit meaning of "time series analysis" in the question: it refers to the analysis of particular *models* of time series data, ones that explicitly incorporate random variables that are not assumed to be independent. For this *class of models,* statistical methods that assume independence obviously don't apply. – whuber Apr 26 '20 at 12:39

2 Answers2

8

The present does depend on, and the future will depend on, whatever is past. Without dependence, yesterday's temperature would have no possible bearing on today's and this year's sales would have no possible bearing on next year's sales.

That is true more broadly on many time scales, from astronomical or geological changes to short-term issues such as predicting tomorrow's weather or next year's economy. Translated into statistical terms, the overwhelming pattern is one of dependence in time series.

Naturally, it is often expected or hoped that the main structure of dependence can be caught by some fairly simple pattern of say trend, seasonality and the results of recent inputs. Conversely, time series analysis is often tricky if only because the future can be very unlike the past (qualitative changes can overwhelm quantitative changes) and because we may know little about a system beyond some data on its recent outputs and some fuzzy notions about its internal operations. Around 1900 there were many dire predictions that the increasing use of horses for transport in cities would overwhelm the means to cope with their direct and indirect consequences (stabling, feeding, accidents, waste removal, ...). Rise the internal combustion engine, and new worries emerged as the old problems faded away.

There are many different mechanisms underlying this, including memory, inertia or momentum in strict or broad senses and phenomena of growth or decay or decline.

Anticipatory behaviour is also important: buying presents or anything else ahead of the time you need them is standard at any scale from individual to international. In that sense the future can influence the present.

Independent noise does have a part to play in time series analysis, however, largely as a way to trying to mimic detailed variability that can't be explained otherwise.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
  • 5
    +1. One aspect may be helpful: we may be able to model, say, seasonality using sine and cosine waves. Fair enough. But then typically the *residuals* from this "general" model will exhibit some kind of autocorrelation and dependence: if it is warmer than the (seasonally adjusted) average today, then it is likely that it will also be warmer than average tomorrow. – Stephan Kolassa Apr 26 '20 at 10:18
3

Many statistical methods assume independence of observations, because the probability of independent events happening simultaneously can be easily calculated as a product of their individual probabilities: $P(A \cap B) = P(A) \cdot P(B)$. This is extensively applied e.g. in maximum likelihood estimation of parameters. However, if your events are not independent, the above formula is not valid anymore.

It's a question of definition what constitutes a time series, but I think everyone agrees that time series include some memory effect. One can say that in time series, current observed value of a variable depends on the observed values (as opposed to modelled values!) at some previous time(s).

To take the above example of daily temperature: The temperature today depends in part on the season, which depends on the relative position of the Earth towards the Sun. Thus, you can model daily temperature through the Earth-Sun relationship. This relationship is physically a function of time, but that's not the reason why temperature is a time series. If we had the power to adjust the Earths position as we wish (as we do with independent variables in a controlled experiment) we could control the season and, consequently, the temperature---to a certain degree.

There is, however, another component to daily temperature: The heat capacity of the ground, the atmosphere, the oceans, etc. The heat that was absorbed yesterday is being emitted today, so yesterday's temperature by itself influences todays temperature. That's what makes daily temperature a time series.

For a counterexample, consider Moon's brightness. It oscillates monthly with Moon phases, but it is not a time series. Even if, due to some extraordinary event, tonight's brightness drastically changes (e.g. due to a Moon eclipse), this will have no effect whatsoever on the brightness tomorrow night! The Moon will be exactly as bright as if it would be without the tonight's eclipse.

You may argue that---barring extraordinary events---knowing yesternight's Moon brightness still allows you to quite precisely predict its brightness tonight. But, time is here just a correlate. It is conceptually the same as saying that knowing the force a spiral spring exerts at an elongation $x$ allows you to quite precisely predict the force exerted at the elongation $x + \Delta x$. If the spring oscillates, the force will vary with time, but it will not vary because of time.

Deciding whether a data set constitutes a time series can be tricky and require substantial domain knowledge. Asking yourself whether some magical manipulation of the outcome at one time point would affect it at some later time is, in my opinion, a useful approach.

Igor F.
  • 6,004
  • 1
  • 16
  • 41
  • 2
    Re "everyone agrees ... memory effect:" If "memory effect" implies non-independence of random variables, I beg to differ, and in support of that consider the answer at https://stats.stackexchange.com/a/126830/919. Your claim that [regular observations of the moon's brightness] is "not a time series" runs counter to most definitions and characterizations of time series. That makes your answer appear idiosyncratic and in need of some authoritative support to be made more credible. – whuber Apr 26 '20 at 12:41