I'm investigating causal effect in some financial data, and I'm using two different approaches: propensity score matching with stratification and the CausalImpact package for Bayesian structural time series. Theoretically, should propensity score matching and CausalImpact give similar results? I know there are differences in methodologies, but let's assume the appropriate data is used, so for one we use features for individuals in the treatment/control group, and the other we use an aggregate time series for the treatment group along with several covariate time series.
My concern lies in how counterfactuals are computed for the treated group. In propensity score matching with stratification, the treated population is split into bins, and counterfactuals are calculated per bin and then combined with a weighted average. For CausalImpact, a single counterfactual is predicted on the whole treatment group's time series. Could this be problematic? For instance, maybe it's easier to predict counterfactuals per bin than the entire group at once, or maybe there's a Simpson's paradox-type phenomenon where we observe positive causal effect per bin but not in the whole time series, so the two methodologies would give different results. Are these valid concerns, or should propensity score matching and CausalImpact always give similar predictions?