6

What am I trying to achieve?

I am trying to test whether there is a structural break in a time-series of proportions at a known break date (21 Dec 2019) . Below is a plot of the original time-series (top panel) and its STL decomposition:.

enter image description here

What approach am I taking?

  1. Apply logit transformation on the proportions
  2. Apply STL decomposition to remove seasonality from data.
  3. Use the seasonally-adjusted values of the STL decomposition to model a linear regression model which will take the form of an AR(p) or MA(q) model.
    • Model on entire time-series
    • Model on time-series before the break date
    • Model on time-series after the break date
  4. Apply the Chow-test by computing the Chow test-statistic and generating the associated p-value.

Here is the seasonally-adjusted time-series of the logit-transformed data:

enter image description here

Where am I unsure?

I want to apply ARIMA models (including simpler AR(p) and MA(q)). This is because they are like simple linear regression models which closely match what’s required for the Chow-test.

These models require stationary time-series, so de-trended, where the detrending can be done in the estimation of an ARIMA model.

However, if I remove the trend as well as the seasonality, I am left with a time series of stationary (random) errors for the Chow test. For instance, this is what the seasonally-adjusted time-series looks like after first-order differencing:

enter image description here

This is where I am confused. Can I still detect a structural break at the point where there is a sudden change in mean or trend or variance values, when the time-series of stationary errors has, by definition of stationarity, constant mean and variance?

Therefore, what data should I model using AR(p) or MA(q)?

  1. The de-trended and de-seasonalised (stationary) time-series data? (remainder)
  2. The de-seasonalised, trend-only time-series data? (trend only)
  3. The seasonally-adjusted time-series data? (trend + remainder)

Have I taken different approaches?

As an alternative, I am considering to model the seasonally-adjusted (so trend + remainder) time series using a linear regression with auto-regressive errors, by regressing the logit-transform data on time and modelling the errors to have an ARIMA structure:

$$y_t = \beta_0 + \beta_1*time + \Theta^-1(B)w_t$$

Bonus question

Am I taking the right approach for testing for structural breaks, or do you recommend other approaches that can be implemented in R?

Already aware about the strchange package, so also want to know whether you need the data to be stationary before passing that in.

  • Can you define explicitly what you mean by a *structural break* in this context? For example, if the logit-transformed series exhibits a long-term linear trend and monthly and weakly seasonality BEFORE and AFTER the known date, you could compare long-term linear trend slopes before and after the known date, periods of monthly seasonality before and after the known date, periods of weekly seasonality before and after the known date to see if there is any evidence of a difference between any of these types of quantities. – Isabella Ghement Mar 13 '20 at 16:01
  • I suspect you are mostly interested in the long-term linear trend (assuming it is a linear trend) and willing to assume seasonality periods are not changed after the known date compared to before the date. If that is the case, you could formulate your problem as a GAM model, where GAM stands for Generalized Additive Model. The model would include different linear trends for the two time periods spawned by the known date, as well as smooth terms of month, week, etc. The model could have either independent or temporally correlated errors. – Isabella Ghement Mar 13 '20 at 16:06
  • @IsabellaGhement, by *structural break*, I mean that we know an event happened on the break date in question, and we want to know whether that changes the average values of the proportions (after accounting for possible seasonality effects). Thus, we are interested in detecting changes in trends / average values. Are you suggesting we do not need to de-trend the time-series? – humblepeasant Mar 13 '20 at 16:13
  • Yes, you are right. I'm mostly interested in the long-term linear trend and am willing to assume seasonality does not change. Would a linear model with autoregressive errors be a good alternative to GAMs as a simple starting point? Thanks a lot! – humblepeasant Mar 13 '20 at 16:16
  • If the timeseries of logit-transformed proportions exhibits long-term trend before and after the known date (as well as seasonality of different periods around that trend), it seems more meaningful to me to focus on comparing the slope of the trend before with the slope of the trend after the known date. – Isabella Ghement Mar 13 '20 at 16:19
  • If the timeseries of logit-transformed proportions exhibits a constant overall level before and after the known date (as well as seasonality of different periods), it seems more meaningful to me to focus on comparing the average of the series before with the average of the series after the known date. – Isabella Ghement Mar 13 '20 at 16:20
  • Yes, a linear model could work provided you model seasonality of different periods using either dummy variables or (sine, cosine) pairs. – Isabella Ghement Mar 13 '20 at 16:22
  • 1
    You could also de-seasonalize your logit-transformed series and then model the left over long-term trends. Essentially, define a time index ($time$) to be 1 on your first date of study, 2 on the second day of study, etc. (presuming study dates are consecutive). Then formulate a model like lm(y ~ time + time_pmax) where y is your deseasonalized logit transformed series and time_pmax = pmax(0, time - timestructbreak). Here, timestructbreak is the time index corresponding to your structural break. – Isabella Ghement Mar 13 '20 at 16:28
  • 1
    Fantastic, thank you so much, this is very helpful @IsabellaGhement! – humblepeasant Mar 13 '20 at 16:29
  • 1
    See http://people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/R-chapter-18.pdf for an example involving yearly rather than daily data. In the linear model (which could be expanded to accommodate temporal correlation in the model errors if necessary), the slope of time represents the (linear) rate of change in the expected value of y *before* the known date. The slope of timestructbreak represents the *difference* in the rates of change between the two time periods. So you can test whether or not the true slope of timestructbreak is different from 0 to get at what you want. – Isabella Ghement Mar 13 '20 at 16:32
  • 1
    If you add up the slope of time and the slope of timestructbreak in the proposed model, you'll get the slope for the second time period. – Isabella Ghement Mar 13 '20 at 16:34
  • Oops! I meant if you add the slope of time and the slope of time_pmax in your model you get the slope for the second time period. time_pmax is a predictor variable, but timestructbreak is the time index for your known date so it can't have a slope attached to it. – Isabella Ghement Mar 13 '20 at 16:55
  • Perhaps the R package `mcp` can handle this - perhaps not. It can model seasonality using sin/cos, model trends and change points, model AR(N), and support a variety of link functions. If you could post some data, I could write up an analysis. Curious to learn whether it would work - I'm the author of mcp. Some relevant info at https://lindeloev.github.io/mcp/articles/arma.html – Jonas Lindeløv Mar 14 '20 at 06:18
  • why don't you post the residuals from your model and present the acf/pacf to ensure that all the sequential steps provided white noise. This would be a test for model sufficiency . You might also post your OBSERVED data before all the artifacts in order to possibly find the de facto break point taking into account possible non-constant error variance . – IrishStat Mar 14 '20 at 18:48

0 Answers0