I am using the Causal Impact package in R to infer the causal effect of an intervention in some data which are highly correlated and seasonal.
Specifically, i got 17 days of hourly data, intervetion happening in the end of day 13. I have two control datasets which are not affected at all by the intervention (with linear correlations of 0.708 and 0.701) and the dataset that includes the intervention (aka "treated")
A piece of the data can be found here
My code is the following
days <- 4
daily.obser <- days*24
data.1 <- cbind(treated.signal.3n,the.control.3,the.control.2)
data.1 <- data.1[1:((length(bsl)+1)+daily.obser), ] #check the required amount of data only
matplot(data.1, type = "l",col = c(2,4,9))
legend("bottomright", inset=.05, legend=c("Treated Zone", "Control Zone 1", "Control Zone 2"), pch=1, col=c(2,4,9), horiz=TRUE)
preperiod <- c(1,length(bsl))
postperiod <- c((length(bsl)+1),(length(bsl)+1+daily.obs))
prior.level.sd.level <- 0.01
imp.1 <- CausalImpact(data.1, pre.period = preperiod, post.period = postperiod,
model.args = list(niter = 2500,nseasons=17, season.duration = 24,
dynamic.regression = FALSE, prior.level.sd =prior.level.sd.level,standardize.data = TRUE))
summary(imp.1)
plot(imp.1,c("original","pointwise"))
summary(imp.1,"report")
My questions are:
I have read the paper and at some point it is talking about the prior distribution for the variance. I do not understand what should i set my prior.level.sd
parameter to, based on my data.
Another problem i m facing is the nseasons,season.duration
arguments. When i specify this, in the results, i m getting that the intervention is insignificant (and CI's are becoming huge), whereas when i dont, the intervention is significant. Is nseasons
supposed to be say the number of days for the whole dataset or just for the preintervention period (eg 17 or 13)? What does specifying the seasonality trully mean? Can i, based on the data skip this?
Results with seasonality specification plots and numbers
Results without seasonality specification plots and numbers
(not providing cumulative since it is not useful in my case)
(you will notice that in the preintervention period the fit is not that good. Can i fix this somehow?)
I do not understand, how am i supposed to specify if i want to standardize the data or not.
Finally, I m thinking about static or dynamic regression. I read in the paper that it is advised to use static when relationship between control and treated is stable. Can someone explain what is meant by stable?
You may find the paper here