How to account for the recency of the observations in a regression problem?

Question

I have a numeric data set with the following format. Y_deseasonal is a deseasonalized variable of a time series with a time horizon from 2016 up to 2020. Each row represents a day

Y_deseasonal    x1  x2  x3
...             ..  ..  ..
342             22  12  25  
359             27  12  25
367             27  12  22
367             27  12  22
367             27  12  22
...             ..  ..  ..

I want to make a mathematical model of Y_deseasonal as a relationship of Xs and plan to test various methods (multivariate regression, Neural Network, Random Forest etc.) Before fitting the models, I am searching for a pragmatic way to account for the recency of the observations and provide more weight into the most recent ones when building the model.

I thought to sample out old observations with a decay effect. For example

for 2016 sample and drop 40% of observations
for 2017 sample and drop 30% of observations
for 2018 sample and drop 20% of observations
for 2019 sample and drop 10% of observations
for 2020 sample and drop 0% of observations

Is it a solid solution or may I explore other options?

This seems related: https://stats.stackexchange.com/questions/205232/how-to-down-weight-older-data-in-time-series-regression — zbicyclist, Mar 17 '20 at 03:15
Many regressions allow weights. I would use weights, not dropout. I would also make sure I was handing training/validation/test splits including cross-validation properly. — EngrStudent, Dec 20 '20 at 00:48

score 1 · Accepted Answer · answered Dec 20 '20 at 02:28

The usual way to do this is to use weighted least squares estimation, with weightings that give exponential decay to the observations according to their time lag. Exponential decay means that the weightings on the observations will reduce monotonically as they recede further into the past, and the weighting for an observation will approach zero asymptotically. If you have observations at times $t=1,...,T$ then the weigthing function would be:

$$w(t) = \exp( - \gamma (T-t) ),$$

where $\gamma > 0$ is a control parameter that determines the rate of decay of the weighting. Usually you would set this as a control variable rather than estimating it from the data, so it would not be a free parameter in the model. The large you set this control parameter the more rapidly the weighting of observations will decay as they recede in time.

This kind of weighting technique is quite useful for dealing with time-series data where the regression relationship may change over time. By applying this weighting method you allow more recent data to "dominate" the regression, which means that the regression will be able to handle changing relationships to some extent. Implementation of this method depends on what model you are using. As an example, if you are using multiple linear regression you can implement this method in R using the lm function by setting the weights parameter to the appropriate set of weights for your data.

IrishStat · Answer 2 · 2020-03-17T09:18:51.497

Your question is an outstanding question that no one ever asks and worse yet that no teacher explains when teaching/explaining/promoting standard regression procedures in the presence of autocorrelated data.

"I am searching for a pragmatic way to account for the recency of the observations and provide more weight into the most recent ones when building the model. " . In other words how to assess the "believability of the data" often if not always ignored in NN and ML procedures.

I can think of three ways to possibly answer your question . I think that discussion #1 is where you are coming from but I include two others for generality and pedagogical reasons.

1) The problem with ordinary regressions that there is an equal weighting of every time period i.e. every row in the data matrix . Take a look at a piece I wrote a number of years ago to help understand the implications entitled "Regression vs Box-Jenkins" http://www.autobox.com/pdfs/regvsbox-old.pdf essentially using the arma structure of the model errors to transform the data matrix to meet the requirements of the standard regression approach. See "shuffling the deck"

2)Additionally to determining how to weigh each row in the data matrix , one might need to weigh differently based upon inherent non-constant error variance suggesting also employing Weighted Least Squares . A scheme to do this by testing the assumption that there is no deterministic change point on model error variance is suggested here http://docplayer.net/12080848-Outliers-level-shifts-and-variance-changes-in-time-series.html

3)Finally if a data point is an anomaly , it needs to be down-weighted by including a dummy variable in the model, This is generally referred to as Intervention Detection . See my back and forth comments (3/2/20) with @whuber for a peek at this . Detect abrupt change in time series

Thanks so much IrishStat! I'm new to regression but my engineering background helps me to search for pragmatic explanations of those complex models ;) — nba2020, Mar 17 '20 at 09:08

score 0 · Answer 3 · answered Dec 20 '20 at 01:16

While it's reasonable to give more weight to near-term data, the first question you need to answer is : what's the objective of my modeling exercise?

In a lot of time series problems, the objective is prediction. That is, given x(:t), I want to predict x(t+1). If that's the case, you're better off by letting your data tell you whether recency matters in your model.

Instead of fitting a model like f(x(:t);theta), you may want to consider f(w(:t)x(:t); theta). A proper walk-forward validation can tell you whether w(t1) > w(t2) for t1 > t2.

How to account for the recency of the observations in a regression problem?

3 Answers3

Linked