Why does differencing a time series remove its memory?

Question

SO Post 345798 asks to enumerate the problems with overdifferencing in time series, and so far that post has focused on one specific problem: the removal of process memory in a manner that could hurt forecasts. Fractional differencing is referenced as a solution.

There are some comments about why (or when) a first difference would remove memory. Certainly if the process was a true random walk, no memory would be lost by differencing.

What is the mechanism behind "memory loss" from taking a difference? When does it occur, and why does it occur?

I am also curious. The first difference $\rightarrow \propto$ the first derivative as the number of samples over a given interval $n \rightarrow \infty$. The original function is recoverable from the first derivative via integration up to a constant. It is possible that the loss of this constant is the "removal of process memory" mentioned. However, it could be recovered from a single point, so it doesn't seem nearly so bad a problem as the referenced question seems to indicate... — Him, Dec 24 '19 at 13:59
@Scott: I understand your point that it's recoverable but econometricians often difference and don't recover because they're not interested.. This is because statistical assumptions and estimation require stationarity which (sometimes) requires differencing. In ML, differencing could be the wrong thing to do. But, in econometrics-statistics, it could be the right thing to do. This is because, if you try to do econometrics on a non-differenced series that's trending, your results will be spurious. google for "spurious regression" for more details on the problems with not differencing. — mlofton, Dec 24 '19 at 14:25
one more thing: I think this is a key difference between ML and econometrics. They attack the problem in different ways so what one field , ML, wants (the original series), another field, econometrics, often doesn't want. granger and newbold figured out the problem with not differencing in 1973 and their example in the literature is quite enlightening. If you can't find it and you're interested, let me know and I can try to find it for you. They use random walks for illustration and the results are glaringly problematic. — mlofton, Dec 24 '19 at 14:28
Could you explain what you mean by "process memory"? The reason I ask is that differencing a random walk clearly removes information: by definition of random walk, the first differences are independent and therefore none of them conveys any information about previous values. Isn't that what "memory" is intended to refer to? — whuber, Dec 24 '19 at 16:05
@whuber: It's a good question that I'm not sure about but I think they mean level information. For example, suppose you had the random walk process for the log price that was $ log(P_t) = log(P_{t-1}) + \epsilon_t$. Then, by differencing, you end up with $return_{t} = \epsilon_t$. But, many times, econometrcians-traders are not interested in the log price so it's okay to difference. All they care about are returns. On the other hand, maybe the ML people believe that there is something useful in the level of the log price in which case differencing would hurt. — mlofton, Dec 24 '19 at 16:46
@mlofton, an interesting perspective, and one that is at odds with my understanding. I wonder how one is supposed to survive the spurious regression problem without differencing. If machine learners did not difference, they would be suffering from the problem and obtaining poor results, something that most of them cannot afford. Meanwhile, the statisticians/econometricians are sometimes interested in levels, sometimes in differences. When interested in the original variable in levels, differencing is just a tool they use to describe, predict or make inference about the original variable. (ctd) — Richard Hardy, Dec 25 '19 at 09:04
@mlofton (...ctd) Sometimes they present the results in terms of the differenced variable to save time and space, since the back transformation is simple and obvious. Statisticians/econometricians who work in the industry cannot afford changing the variable of interest, so they refer to the original variable when they need to. So in my perspective, there is no dichotomy between ML and stats/econometrics the way you describe it. (ctd...) — Richard Hardy, Dec 25 '19 at 09:04
@mlofton, (...ctd) Going back to your example of traders, I think both these who use ML and these who use stats/econometrics care about the same things when it comes to levels vs. first differences. Now, a dichotomy that I tend to see is the goal of the analysis (mostly prediction in ML vs. mostly inference in statistics/econometrics) and the way results are reported (mostly point predictions in ML vs. prediction intervals, confidence intervals, p-values etc. in stats/econometrics). So that is my perspective to complement yours. — Richard Hardy, Dec 25 '19 at 09:06
Hi Richard: I agree that differencing is the way to go for estimation. But as far as ML, they do other things besides estimation so this may be why levels could be useful for them ? Maybe they use the level as a feature that goes into the ML machinery ? I don't know for sure but I'm with you that, absent cointegration, if you want to do estimation, it's best to difference a non-stationary series. — mlofton, Dec 26 '19 at 16:53
By differencing, you don't explain the differenced part by model. You accepts the difference as the fact and don't explain it. And your model focuses on the explanation of the residual of differencing. — hbadger19042, Jun 02 '20 at 09:32

Why does differencing a time series remove its memory?

0 Answers0

Linked