Stationarity in presence of an outlier

Question

I am working on a bimonthly data where I have customer the customer's sales amount. I tried to plot the original series in python and the plot

import matplotlib.pyplot as plt

Cust_bimonthly_Data['Customer_Sales'].plot(figsize=(12, 8))
plt.title('Cust Bimonthly Daily')
plt.show()

I tried to plot the above time series in Python and it looks like this

In order to remove this big peak in my data which was an outlier I did log(x+1) transformation on my data i.e. increased all the values to 1 and then did a log transformation

Cust_bimonthly_Data['new_Customer_Sales'] = Cust_bimonthly_Data['new_Customer_Sales']+1

taking the log so as to remove the outliers

Cust_bimonthly_Data['log_cust_sales']=np.log(Cust_bimonthly_Data['new_Customer_Sales'])

**the log transformed series looks like this **

In order to check if my log transformed data is stationary or not I did a ADF test and this is what my results look like

Dickey Fuller test to check if the series transformed series is stationary or not

from statsmodels.tsa.stattools import adfuller
Cust_bimonthly_Data_test= Cust_bimonthly_Data_drop.iloc[:,0].values
result = adfuller(Cust_bimonthly_Data_test)

(-4.8014847417664424,
5.4031369234729222e-05,
0,
63,
{'1%': -3.5386953618719676,
'10%': -2.591896782564878,
'5%': -2.9086446751210775},
150.10425215395222)

question?

Since this test is rejecting the null hypothesis that my series is not stationary , should I still go ahead and perform the decomposition and differencing part. I mean will all those things would still be required since I can see the test tells me that my series is now stationary

It appears the customer is buying irregularly but is making up for periods of low purchases by one-time larger purchases. By exploiting this fact (if indeed it is correct), you can produce a model that will be far more insightful and accurate than anything any automatic procedure can do for you. It really wouldn't be appropriate to supply specific advice until you can explain what your objectives in this analysis are. (Only an academic researcher would ultimately be interested in the question of stationarity.) — whuber, Jun 23 '18 at 20:37
The automatic analysis suggested that the customer is indeed buying irregularly as there is no identifiable pattern i.e. arima structure or memory structure. There is only 1 "large purchase" which suggests a unusual activity which at this point can't be predicted just identified and adjusted for. — IrishStat, Jun 23 '18 at 21:03
@Irish I am trying to suggest that such large purchases *are* likely predictable: the probability of a large purchase ought to increase after a series of small purchases, at least if the customer's needs are roughly constant over time. However, because it's unlikely any ARIMA-like model will handle such a phenomenon appropriately, there is potentially great value in constructing a model that addresses these circumstances. — whuber, Jun 24 '18 at 19:32
What AUTOBOX developed was the anthesis of an arima-type model as neither previous values nor previous errors were identified/suggested factors/features of the model. What is suggested is that three events were triggered by unspecified exogenous variables. The two pulses and the shift in mean suggest that one investigate the root exogenous cause . This Exploratory Data Analysis is meant to drive the investigation by implicitely "asking the user" to find out the "whys" . — IrishStat, Jun 25 '18 at 09:24

IrishStat · Answer 1 · 2018-06-24T19:03:19.267

One doesn't take logs or any other power transform to deal with anomalous observations like pulses,level shifts,seasonal pulses or local time trends . See When (and why) should you take the log of a distribution (of numbers)? for when to power transform and see http://docplayer.net/12080848-Outliers-level-shifts-and-variance-changes-in-time-series.html for how to deal with anomalies.

Also the dickey-fuller test requires/assumes that any and all pulses,level shifts,seasonal pulses and local time trends have been incorporated into the model and thus are not present in the current set of residuals.

It looks like you have at least 2 pulses and 2 possible level shifts and possible arima structure. Post your data and I will try and help.

EDITED AFTER RECEIPT OF DATA..

I took your 33 bi-monthly values ( too short to look for seasonal structure ) and used AUTOBOX and obtained the following Actual/Fit and Forecast graph

. Two pulses and one level shift ...

The stats are here and here

The acf of the residuals is here suggesting randomness.

(1) IN RESPONSE TO OP'S QUESTION:

the acf of the original series does not suggest non-stationarity because the level shift obfuscated the variance creating a downwards bias to the acf. Non-stationarity was detected by detecting the presence of a level shift via exploratory data analysy suggesting the need (mandatory) to introduce a level/step shift EXOGENOUS series as the remedy for the non-stationarity .

by this ACF plot suggesting randomess, if I have understood this correctly, the residuals are not stationary and I need to apply more techniques to make it stationary. Is this my right understanding? — Swati Kanchan, Jun 24 '18 at 18:14

score 0 · Accepted Answer · answered Jun 23 '18 at 19:46

0

Perhaps one could assume that data is generated by the linear combination of two processes, one producing seasonality plus perhaps trend and other which incorporates unit impulses to the level of series by certain probability.

As IrishStat mentioned your complete model must be able to handle that.

Of course you could use Autobox which David Reilly has created....

answered Jun 23 '18 at 19:46

Analyst

2,527
10
11

are you trying to suggest that I should go ahead start plotting the ACF and PACF without being much convcenred about stationrity now ? – Swati Kanchan Jun 24 '18 at 18:18
No, you must provide full model. If you have outliers in the data set, then method of moments estimators such as SACF and SPACF easily break down. – Analyst Jun 24 '18 at 19:03
Is the log treatment not an good option here according to you? – Swati Kanchan Jun 25 '18 at 03:04
Log treatment can be done but IrishStat has here more complete solution. – Analyst Jun 25 '18 at 19:11

Stationarity in presence of an outlier

taking the log so as to remove the outliers

Dickey Fuller test to check if the series transformed series is stationary or not

question?

2 Answers2