10

I have a binary time series: We have 2160 data (0=didn't happen, 1=happened) for one-hour period in 90 days.

enter image description here

I want to forecast after these 90 days, where the next 1 will happen, and also Extend this provision for next one month.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    Could you describe your data in greater detail? What kind of events does it describe? What is known about the process that generated the data (e.g. could we expect some kind of seasonality or patterns)? Could you post your data as an example? – Tim Feb 17 '16 at 18:05
  • i have a research about acciedents in a Specific place.1 is we have an accident in one hour interval ,and 0 otherwise.we want to predict next accidents. – amin abdolahnejad Feb 17 '16 at 18:21
  • 2
    Are you saying you want to forecast how long it will be until the next accident, or you want to forecast how the probability of an accident will change / not change over the next period of time? – gung - Reinstate Monica Feb 17 '16 at 18:23
  • You need to tell us more about the data, and the assumptions you are willing to make. What's the underlying process? Is it slowly changing over time? Is it stationary? Does it have finite memory? – Memming Feb 17 '16 at 18:33
  • we have hour-by-hour period in 90 days that 2160 data.i want to predict 2161 to 2880 hour-by-hour period that meant next 30 days.i want to forecast when the next accident will happen that we have prepared for it. – amin abdolahnejad Feb 17 '16 at 18:49
  • An excellent textbook for generalized linear time series is by [Fokianos](http://www.amazon.com/Regression-Models-Time-Series-Analysis/dp/0471363553) – bdeonovic Feb 18 '16 at 01:38
  • https://towardsdatascience.com/arima-for-classification-with-soft-labels-29f3109d9840 – Marco Cerliani Mar 16 '21 at 13:24

3 Answers3

6

One approach might be to assume that the Bernoulli sequence can be described by a latent Normal random variable using the Probit transformation. That is your realized $X_t \sim Bernoulli(p_t)$ where $p_t \sim \Phi^{-1}(Y_t)$ and $Y \sim N(\mu, \Sigma)$. This way you can place whatever time-series (e.g. ARIMA) structure you like on your $Y$ variable and then use standard time-series techniques to predict future observations (e.g. Holt-Winters). Should be possible to code something like this up in Stan or JAGS, but you might not get great predictions given the "glass darkly" view the Bernoulli process gives you of the latent state.

Dalton Hance
  • 1,118
  • 7
  • 13
1

Simplest model would be linear regression. You can plot your data using ggplot:

#for reproducing
set.seed(200)
#simple example. Assume your data is simple binomial variable with probability 0.3
data <- data.frame(time = 1:200, val=sample(c(0,1), size = 200, replace = T, prob = c(0.3, 0.7)))

#plot using ggplot and add linear regression and confidence interval
ggplot(data, aes(x = time, y=val)) + geom_smooth(method=lm) +geom_point()

#Now we can try to create linear regression
y = data$time
    x = data$val
fitData <- lm(x ~ y)
predict(fitData, newdata = data.frame(y=201:224), interval="confidence")

This is the simplest model, there are other non-linear models, that might fit your data better. Also, bear in mind that you might have to use log of date, to get better fit. On non-linear regressions such as polynomial regression you can read a lot here

Now, it would require additional analysis, but it is essential to establish whether your events are independent. It is possible, that there is some sort of confounding variable that you might not account for. You might want to look into Bayesian linear regression (given you obtain more dimensions than just time and yes/no values) here

Zakkery
  • 157
  • 5
  • tnx for your answer.first i want to predict hour by hour for next day,hour by hour for next week and hor by hour for next month. – amin abdolahnejad Feb 17 '16 at 18:08
  • it cant be linear reg.we have binary code and polynomial model till degree of 7 cant give us a good fit.we should focus on binary model.what about markov model?Hidden markov model?if we have the probability of accident in every hour in next month ,it can be useful. – amin abdolahnejad Feb 17 '16 at 20:03
  • Linear model was just a simple example. You could use it and consider all data > some threshold as 1 and less than as 0. As I said, maybe you can introduce more dimensions to your data, for an example, you can try to find how often accidents happen per some hour group (like between 5-6pm when people return home), or per month or day of week. Probably then you would be able to build more elaborate model. Are you sure that your events are actually memoryless? Maybe it might be a better idea to use neural network. – Zakkery Feb 17 '16 at 20:14
  • 3
    The response variable is binomial. Linear regression assumes normal errors. Nor does linear regression address potential autocorrelation in a time series. While perhaps a useful first order approximation, this is not the best approach. – Dalton Hance Feb 17 '16 at 20:18
  • 1
    That is a good remark. How about then taking that time series, grouping data by hour of day (for example) and then taking average of it? Considering it is identically distributed random variable, shouldn't we get expected value, due to CLT? I am not sure if that can be used as a predictor, but it certainly would give a good estimate of the probability that accident happens at particular hour. – Zakkery Feb 18 '16 at 00:26
  • 1
    I suppose if you think there is a periodic pattern to the data that is described by hour of day, then that approach might work. For example if the data were something like $X_t =$ 1 if I'm having a meal (breakfast, lunch, or dinner), and 0 otherwise. But that doesn't appear to the be case from the plot. There isn't much evidence of periodicity, but rather there are long stretches of 1's followed by 1's (blocks of blue) and long stretches of 0's follow by 0's. – Dalton Hance Feb 18 '16 at 17:35
1

Accident data? I'd start by assuming there's hourly seasonality and daily seasonality. Without knowing the type of accident, it may be that you could look at hourly pooling Monday through Friday, and handle hourly for Saturday and Sunday separately, so you have 3 pools of hours, 24 (Mon-Fri), 24 (Sat) and 24 (Sun).

Further data reduction might be possible, but assuming not, just take the averages. For example, the average for Sunday 3pm might be .3 (30% chance of an accident). The average for 4pm might be .2, and so on.

The probability of no accident occurring in 3pm or 4pm would be (1-.3)(1-.2) = .56, so the probability of having an accident in these two hours would be .44, and so on.

This seems to be a good, simple place to start.

zbicyclist
  • 3,363
  • 1
  • 29
  • 34