Up front, please node that you always have to consider the possibility of latent confounders. Those lead to two random variables being stochastically dependent without causal dependency. They are not measured (that's why they are called latent or sometimes hidden), but are almost always present and usually screw up your data. So if you cannot rule out latent confounders, you have to use methods that allow for latent confounding. It is well known (and has already been known by Granger himself), that, unfortunately, Granger causality cannot deal with this confounding.
Next, some consideration about the data (I will stick to the case of two timeseries, but the methods below also work for more than two timeseries):
If you want to distinguish between the causal influences on $Y$ from the older parts of sequence $X$ compared to the influences on $Y$ from the newer parts of sequence $X$, and maybe the other way around, you would have to consider a timeseries (of data that is aggregated as you described). Otherwise, if you don't care about the difference between older and newer influence, I would just suggest to consider two random variables, $X$ and $Y$, aggregate each of those over intervals $I_i$ (they can be larger than in the timeseries case) ($agg_i(X)$ and $agg_i(Y)$) leading to data pairs $(agg_i(X), agg_i(Y))$ of some new random variables $\cal{X}$ and $\cal{Y}$. And now we would like to know the causal relationship between $\cal{X}$ and $\cal{Y}$.
Note, that especially in the case that you have described, we have to accept the possibility of cyclic causality, i.e. $\cal{X}$ causes $\cal{Y}$ and $\cal{Y}$ causes $\cal{X}$:
$$
\cal{X} \rightleftarrows \cal{Y}.
$$
Figuring out the edges and the arrow heads in causal graphs is sometimes called causal discovery. Causal discovery can be done either by changing the experiment that creates the data in well designed ways (interventions) or by just analysing the provided data.
Of course, the former leads to much better results, but is often not feasible. So lots of research has been and is going into causal discovery just from observational data, i.e. data that has been provided to you without you being able to dictate the design of the experiments.
I presume that you want to know about causal dicovery which in particular is not excluding the cyclic case and the presence of latent confounders. Fortunately, there are quite a number of (relatively new) papers that deal with this problem and even provide ready to use implementations. I cite the most important ones:
- Hyttinen, Antti, Frederick Eberhardt, and Patrik O. Hoyer. "Learning linear cyclic causal models with latent variables." The Journal of Machine Learning Research 13.1 (2012): 3387-3439.
- Forré, Patrick, and Joris M. Mooij. "Constraint-based causal discovery for non-linear structural causal models with cycles and latent confounders." arXiv preprint arXiv:1807.03024 (2018).
- Rantanen, Kari, Antti Hyttinen, and Matti Järvisalo. "Learning Optimal Cyclic Causal Graphs from Interventional Data." International Conference on Probabilistic Graphical Models. PMLR, 2020.
- Mooij, Joris M., and Tom Claassen. "Constraint-based causal discovery using partial ancestral graphs in the presence of cycles." Conference on Uncertainty in Artificial Intelligence. PMLR, 2020.
As I said, they all come with impelmentations. They all can deal with cyclic graphs and hidden confounders. The first one (Hyttinen et al.) is for the linear case, the others also cover the nonlinear situation. The second (Forre and Mooij) as well as the third (Rantanen and Hyttinen) give you exact results (mathematically proven correct, in a certain sense), but they don't scale well; you won't be able to analyse networks with more than eight or nine nodes. The first (Hyttinen et al.) and the fourth (Mooij and Claassen) scale better (about 50 and sometimes even 100 and more nodes), but are "not as correct" as the other two.
Those algorithms, if you don't provide them with additional interventional data, can only discover parts of the causal graph, not much more beyond the sceleton of the graph (that is just the edges without the arrow heads). For those, there are further possibilities. E.g., if you are willing to presume that your noise is additive, i.e. you have an ANM (Additive Noise Model), you should read the very cool paper:
- Hoyer, Patrik, et al. "Nonlinear causal discovery with additive noise models." Advances in neural information processing systems 21 (2008).
This paper as well as many newer papers that cite this one give you an idea of how to proceed with directing your edges.