1

I'm trying to build an ARIMA model which models flow in sewer pipes (Here is a snippet of ts: https://imgur.com/CuEzCNU).

The goal is to specifically model rapid increase in flow though the pipes as they are more critical than the seasonal values which are much more abundant and might have a negative effect on the model. Thus, my idea is to train a model only on these rare events and hopefully get a model that will model other similar events well when tested. However, I'm a little confused on how one would do that. I've tried smoothening out the TS and extracting the event in which runoff increases a lot to a vector and was going to train on that but currently, I'm having doubt on that approach.

Any ideas are well appreciated

monoalbino
  • 11
  • 1
  • Don't use ARIMA, at least not "plain vanilla" ARIMA. It presupposes normally distributed innovations, with a constant innovation variance over time. This is rather obviously not the case here. Is there any kind of seasonality in your data? Can you use some kind of leading indicators (perhaps rainfall, if this is runoff)? Possibly GARCH may be helpful. – Stephan Kolassa Feb 21 '20 at 23:00
  • Yes, I have data on the rainfall and intend on using it with the runoff values to model the behaviour. There is a seasonality of 24 hours but I haven't found any other seasonality in the data. – monoalbino Feb 23 '20 at 01:23

1 Answers1

0

You might want to look into how imbalanced classification can be incorporated in time series analysis.

As you were alluding to, you could oversample the rare events in your training set. Take a look at this article: https://link.springer.com/article/10.1007/s41060-017-0044-3

Dylan
  • 1
  • [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Feb 21 '20 at 22:58