4

I'm currently dealing with time series data about sales of beverages in a supermarket. I have data for each minute, but I'm aggregating by hour since the dataset is already huge with a daily granularity.

I would like to detect stock outs occurring to a product, where a stock out happens when all the items of a given product where sold and the product is not available until refilled. I was suggested to use a Hidden Markov Model, but having little knowledge about it I don't know how to set up the problem, so I ask for help with references and suggestions of the statistical and mathematical assumptions to use. My idea would be something like:

$Y_i$ = hourly time series of sales, $i=1,...,n$,

$Y_i \in Z^+$

$Z_i $ = hourly time series of latent states, $i=1,...,n$

$Z_i \in {0,1}$, with $0$ corresponding to no items available/stock out

Basically I would like to infer the latent state given the observable sequence of sales.

I'm currently trying to find it using depmixS4 library in R:

dep2 <- depmix(resp ~ xreg,nstates=2,family=poisson(),ntimes=length(resp))
hmm2 <- fit(dep2) 
ba2 <- BIC(hmm2)
summary(hmm2)

Thank you in advance

Tommaso Guerrini
  • 1,099
  • 9
  • 27
  • 1
    Have you tried the approach I suggested as I believe it answers your question as nobody on the list has taken up the challenge for a bake-off . – IrishStat Dec 06 '16 at 16:23
  • @IrishStat Actually I'm implementing the hidden markov model, I'll give some more time to others before accepting your answer :) – Tommaso Guerrini Dec 06 '16 at 16:43
  • I would like to see your results in terms of model adequacy i.e. tests of significance of included structure ; validation of sufficiency i.e. critical examination of model residuals suggesting that they are free of remaining structure . You might also share the data (it can be coded to mask the actual ) so as your results AND mine can be compared/criticized and improved.by other readers/practitioners. In this way we all learn even the teachers ! – IrishStat Dec 06 '16 at 16:56
  • @TommasoGuerrini: I am working on something very similar. If its not an issue could you share a snapshot of your data and code. I am also new to HMM and am not sure how to preprocess the data or implement HMM. Thanks. – Raj Mar 02 '17 at 00:48
  • Hi Yuvaraj, look at the code here (http://stats.stackexchange.com/questions/255697/semi-hidden-markov-model-with-parameters-of-the-emission-probabilities-depending?rq=1) By the way, can I ask you which kind of problem? I am writing my final dissertation and whichever example could help – Tommaso Guerrini Mar 02 '17 at 09:50
  • @TommasoGuerrini: Thanks, appreciate the help. Mine is inventory stock-out prediction. Much like the problem you have described above, only in my case i have the end-of-day inventory. I also have product flow information like daily inflow & outflow of inventory to and from the factory. Objective is to predict the probability of a stock-out a week or two in the future. – Raj Mar 02 '17 at 19:48
  • @Yuvaraj guerrinitom@gmail.com write me there and let's chat! – Tommaso Guerrini Mar 10 '17 at 09:39

1 Answers1

2

Build a Transfer Function model for hourly sales that takes into account hourly effects , daily effects, weekly effects, monthly effects, holiday effects , day-of-the-month effects, week-of-the-month effects, month-of-the-year effects , current level and trend effects , holiday effects, long-weekend effects , price effects , promotion effects , weather effects et. al. while taking into account any ARIMA effects. This will allow you to identify anomalous behavior via Intervention Detection . If the predicted sales are non-zero for any identified Intervention Point in time and the observed sales are zero then you have identified a potential stockout as the observed zero was significantly different from the expected non-zero .

You might want to look at Seasonality in residuals ACF and PACF to review how you might build the 24 hourly models using empirically identified structure and user specified causal variables .

Hidden Markov Models deal with latent or hidden variables. What I proposed was a way of identifying the latent/hidden variables by appropriately layering in data-suggested effects (hourly,daily etc) in sufficient quantity and form without collapsing the solution. To form these (waiting to be discovered variables) takes aggressive heuristics employing a lot of trial and error . So you might conclude that intelligent time series modelling is just one very aggressive Markov Model reaching into the data to form sufficient structure to render a white noise error process.

I am not an expert in HMM but I would be very interested in a bake-off comparing HMM and Transfer Function Modelling for your data if someone else can come forth to provide a HMM solution.

IrishStat
  • 27,906
  • 5
  • 29
  • 55
  • After finding and studying much literature about supermarket sales forecast (https://www.researchgate.net/profile/Richard_Weber/publication/222568533_Improved_supply_chain_management_based_on_hybrid_demand_forecasts/links/567ab0d808ae19758381054b.pdf?origin=publication_detail) or (http://www.sciencedirect.com/science/article/pii/S0377221706000737) I found that: 1. Often the predictions are calculated only for high selling items (50 most selling in first paper, median(sales)>5 for 2nd) 2. Interval forecasts are used instead of too difficult point forecasts – Tommaso Guerrini Dec 13 '16 at 10:23
  • Furthermore, In my case, the current unavailability of the SKU data and stock rotation levels made me focus on finding the most accurated point forecasts, while this could be unneccessary when I'll receive data about stock-outs and stock levels (so that missing by 10 items my prediction could not be a problem if I have 100 items stored in the supermarket warehouse). I thought this could be interesting and hope that other users trying to predict supermarket sales will not spend so much time in trying to forecast in a very accurate way – Tommaso Guerrini Dec 13 '16 at 10:28