Negative values in time series forecast and high fluctuations in input data

Question

I am trying to perform univariate time series forecasting in python on a monthly rainfall dataset of 136 years using ARIMA.

My dataset is of the form:

YEAR RAINFALL

2000-01-01 0

2000-02-01 128.2

2000-03-01 0

2000-04-01 289.3

. . .

I have two issues.

1) My forecast results have negative values though there are none in the training set and logically the rainfall values shouldn't be negative. My original data plot is as below.

Below is the graph of the test data and predicted values. As you can see the red curve of forecasted values extends below 0.

2) Since I have monthly data, the rainfall in some rows goes from a 0 to directly a high value in the next month, in which case the current value doesn't depend on the previous observed values as is the principle of autoregression. Is this what is causing a problem and not giving me a good fit? I have tried using yearly data instead but that doesn't give a right fit either and working with quarterly frequency will interrupt the actual monsoon period of the region of my dataset.

Here is the link to my dataset- https://docs.google.com/spreadsheets/d/1JEj9QZNQagLg-hKhzF2p0yNJsxceMlN1l0LpGDs4eg4/edit?usp=sharing

post your data in a csv file and I will try and answer your question .... as goodl answers depend on the statistical characteristics of the time series — IrishStat, Mar 10 '19 at 13:01

score 2 · Answer 1 · answered Mar 11 '19 at 14:41

I took your 1380 monthly values and introduced them to AUTOBOX and the following useful model ( in 3 parts ) was automatically developed/identified and and . The residual plot is here with acf here . A significant error variance change reduction) was identified here . Forecasts are here which were generated using monte-carlo / bootstrapping procedures.

As it turned out no expected value forecast was negative , but if it had one should/could simply convert it to zero as no constraint is available .. just a logical constraint.

In terms of your forecast function based upon a model you didn't share ... I would suggest better analytics might be helpful ... including remedying unusual values and non-constant error variance. The ARIMA model that was developed was (0,0,0)(0,1,1)12 . The ARIMA model should always be identified using data adjusted for deterministic structure.

You might want to look at How to improve this time series model? for a similar case study.

Negative values in time series forecast and high fluctuations in input data

1 Answers1

Linked