I have a binary time series: We have 2160 data (0=didn't happen, 1=happened) for one-hour period in 90 days.
I want to forecast after these 90 days, where the next 1 will happen, and also Extend this provision for next one month.
I have a binary time series: We have 2160 data (0=didn't happen, 1=happened) for one-hour period in 90 days.
I want to forecast after these 90 days, where the next 1 will happen, and also Extend this provision for next one month.
One approach might be to assume that the Bernoulli sequence can be described by a latent Normal random variable using the Probit transformation. That is your realized $X_t \sim Bernoulli(p_t)$ where $p_t \sim \Phi^{-1}(Y_t)$ and $Y \sim N(\mu, \Sigma)$. This way you can place whatever time-series (e.g. ARIMA) structure you like on your $Y$ variable and then use standard time-series techniques to predict future observations (e.g. Holt-Winters). Should be possible to code something like this up in Stan or JAGS, but you might not get great predictions given the "glass darkly" view the Bernoulli process gives you of the latent state.
Simplest model would be linear regression. You can plot your data using ggplot:
#for reproducing
set.seed(200)
#simple example. Assume your data is simple binomial variable with probability 0.3
data <- data.frame(time = 1:200, val=sample(c(0,1), size = 200, replace = T, prob = c(0.3, 0.7)))
#plot using ggplot and add linear regression and confidence interval
ggplot(data, aes(x = time, y=val)) + geom_smooth(method=lm) +geom_point()
#Now we can try to create linear regression
y = data$time
x = data$val
fitData <- lm(x ~ y)
predict(fitData, newdata = data.frame(y=201:224), interval="confidence")
This is the simplest model, there are other non-linear models, that might fit your data better. Also, bear in mind that you might have to use log of date, to get better fit. On non-linear regressions such as polynomial regression you can read a lot here
Now, it would require additional analysis, but it is essential to establish whether your events are independent. It is possible, that there is some sort of confounding variable that you might not account for. You might want to look into Bayesian linear regression (given you obtain more dimensions than just time and yes/no values) here
Accident data? I'd start by assuming there's hourly seasonality and daily seasonality. Without knowing the type of accident, it may be that you could look at hourly pooling Monday through Friday, and handle hourly for Saturday and Sunday separately, so you have 3 pools of hours, 24 (Mon-Fri), 24 (Sat) and 24 (Sun).
Further data reduction might be possible, but assuming not, just take the averages. For example, the average for Sunday 3pm might be .3 (30% chance of an accident). The average for 4pm might be .2, and so on.
The probability of no accident occurring in 3pm or 4pm would be (1-.3)(1-.2) = .56, so the probability of having an accident in these two hours would be .44, and so on.
This seems to be a good, simple place to start.