7

Suppose I have time series data for two variables, X and Y. Assume that I am already convinced that there is a causal relationship between the two, but that I am unsure of the direction. Also assume that there will be a short time delay between cause and effect (here, short means at least an order of magnitude less than the time interval between consecutive data points). Are there any standard/established methods for inferring the direction of the causal effect?

Granger causality seems like a potentially relevant concept, but I have only just heard about that concept and I am uncertain.

$\textbf{Edit}$: I want to clarify the "Assume I am already convinced that there is a causal relationship between the two" portion of my question. Let me give an example:

Early in the COVID-19 pandemic local (to my area) health centers weren't giving out tests because there had been no reported COVID-19 cases in the area. I guess tests were wanted more badly elsewhere. Once they got a few obvious-enough cases (I think they were hospitalizations) there started to be more testing.

Now imagine $X$ is the rate of overall testing and $Y$ is the rate at which we get positive results. We can expect, based on policies like the one I just described, that $Y$ has an affect on $X$. But, we also expect that more testing will just find more cases, so $X$ has an affect on $Y$. The question is: how much each way?

This isn't really the application I'm interested in, but I think it basically illustrates what I mean. When I say I'm convinced, I don't necessarily mean convinced by statistics. There are some situations where some kind of causal relationship is a reasonable guess a priori.

  • See as well https://stats.stackexchange.com/questions/45999/introduction-to-causal-analysis – msuzen Feb 09 '22 at 21:44
  • *"Assume that I am already convinced that there is a causal relationship between the two, but that I am unsure of the direction."* This is very tricky. The type of arguments that convince one to believe there is a causal relationship but do not tell about the direction, are often very indirect and there might be more going on than just a *direct* causal relationship. If there is no indication of a direct causal relationship then time delay 'post hoc ergo proper hoc' will be fallacious, – Sextus Empiricus Feb 11 '22 at 22:28
  • @SextusEmpiricus Does my edit help at all? –  Feb 12 '22 at 00:01
  • @user37344 the example with Covid-19 does not help much. That situation is horribly complicated and you are not at all sure whether X causes Y and/or the other way around. We can have in increase of X and Y independent from each other, for instance when the number of cases rises (which happens with or without tests) and when the number of tests increases (because there is more availability). There has been research on correlation using spatial and temporal difference but that remains just correlation and you can't infer causality.... – Sextus Empiricus Feb 12 '22 at 08:04
  • .. a disturbing factor is governments making predictions of the epidemic and basing responses on that. The response can be ahead of the cause. In addition there are multiple measures being taken and if you measure just two then there is too much confounding. The best way to infer a causal effect would be to perform a controlled experiment. The use of instrumental variables can try to approach this best way. But, the timing of events and apply [post hoc ergo propter hoc](https://en.m.wikipedia.org/wiki/Post_hoc_ergo_propter_hoc) can only be used if you have a good idea about what's happening. – Sextus Empiricus Feb 12 '22 at 08:05
  • Maybe this helps: https://ei.is.mpg.de/publications/5902 – timm Feb 12 '22 at 10:15
  • @timm Looks interesting, thanks for the link. –  Feb 15 '22 at 22:32

3 Answers3

4

Causal direction, or causal discovery from data is a large research topic. Causal Discovery Algorithms notebook of Cosma Shalizi given a nice list of approaches. However, one has to distinguish, structure discovery i.e., causal graphs as a separate task than only discovering directions.

A nice overview of, Answering causal questions using observational data, Memorial Nobel prize of 2021, see here.

msuzen
  • 1,709
  • 6
  • 27
  • Thanks, this looks like a fantastic list. Do you know of any specific papers that might help me find the direction part? –  Feb 09 '22 at 22:44
  • For practical software, see pairwise tools from causal discovery package: https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/causality.html – msuzen Feb 09 '22 at 23:23
0

It's tricky to answer this question because you'd have to be much clearer on what you mean by I am already convinced that there is a causal relationship between the two, but let's assume God came down to earth and told you that there is a causal relationship between A and B. In this case, because it's time-series data, if it's possible to identify which one happens before the other, we can say that the one who happened earlier caused the other, since cause always preceding effect in time is a fairly accepted assumption.

However, according to your description, it's possible that you can not identify which one happened first. Granger causality won't help you here, since it's not even able to identify a causal relationship. The causality in its name is pretty unfortunate, according to Granger himself.

Jonas Peters has done some work on this topic and I think a good reading for you is this paper of his: Detecting the direction of causal time series. They fit an autoregressive moving average model and investigate noise to identify ordering of the causal relationship.

mribeirodantas
  • 796
  • 3
  • 17
  • 1
    I want to clarify the "according to your description, it's possible that you can not identify which one happened first." Comment. I'm aware that Granger causality does not always apply, but doesn't my assumption about a delay between cause and effect mean that it is usable here in principle? –  Feb 09 '22 at 18:09
  • To *infer causality*, in practice, granger causality never applies. It's not the goal of the tool. But if you're sure A happens before B, and you're sure A and B are causally related, then you can say A causes B. Using Granger Causality in your problem will tell you what Granger Causality is able to tell, which is how better you can predict B with the past of B and A when compared to only the past of B. – mribeirodantas Feb 09 '22 at 21:05
0

Up front, please node that you always have to consider the possibility of latent confounders. Those lead to two random variables being stochastically dependent without causal dependency. They are not measured (that's why they are called latent or sometimes hidden), but are almost always present and usually screw up your data. So if you cannot rule out latent confounders, you have to use methods that allow for latent confounding. It is well known (and has already been known by Granger himself), that, unfortunately, Granger causality cannot deal with this confounding.

Next, some consideration about the data (I will stick to the case of two timeseries, but the methods below also work for more than two timeseries): If you want to distinguish between the causal influences on $Y$ from the older parts of sequence $X$ compared to the influences on $Y$ from the newer parts of sequence $X$, and maybe the other way around, you would have to consider a timeseries (of data that is aggregated as you described). Otherwise, if you don't care about the difference between older and newer influence, I would just suggest to consider two random variables, $X$ and $Y$, aggregate each of those over intervals $I_i$ (they can be larger than in the timeseries case) ($agg_i(X)$ and $agg_i(Y)$) leading to data pairs $(agg_i(X), agg_i(Y))$ of some new random variables $\cal{X}$ and $\cal{Y}$. And now we would like to know the causal relationship between $\cal{X}$ and $\cal{Y}$.

Note, that especially in the case that you have described, we have to accept the possibility of cyclic causality, i.e. $\cal{X}$ causes $\cal{Y}$ and $\cal{Y}$ causes $\cal{X}$: $$ \cal{X} \rightleftarrows \cal{Y}. $$ Figuring out the edges and the arrow heads in causal graphs is sometimes called causal discovery. Causal discovery can be done either by changing the experiment that creates the data in well designed ways (interventions) or by just analysing the provided data.

Of course, the former leads to much better results, but is often not feasible. So lots of research has been and is going into causal discovery just from observational data, i.e. data that has been provided to you without you being able to dictate the design of the experiments.

I presume that you want to know about causal dicovery which in particular is not excluding the cyclic case and the presence of latent confounders. Fortunately, there are quite a number of (relatively new) papers that deal with this problem and even provide ready to use implementations. I cite the most important ones:

  • Hyttinen, Antti, Frederick Eberhardt, and Patrik O. Hoyer. "Learning linear cyclic causal models with latent variables." The Journal of Machine Learning Research 13.1 (2012): 3387-3439.
  • Forré, Patrick, and Joris M. Mooij. "Constraint-based causal discovery for non-linear structural causal models with cycles and latent confounders." arXiv preprint arXiv:1807.03024 (2018).
  • Rantanen, Kari, Antti Hyttinen, and Matti Järvisalo. "Learning Optimal Cyclic Causal Graphs from Interventional Data." International Conference on Probabilistic Graphical Models. PMLR, 2020.
  • Mooij, Joris M., and Tom Claassen. "Constraint-based causal discovery using partial ancestral graphs in the presence of cycles." Conference on Uncertainty in Artificial Intelligence. PMLR, 2020.

As I said, they all come with impelmentations. They all can deal with cyclic graphs and hidden confounders. The first one (Hyttinen et al.) is for the linear case, the others also cover the nonlinear situation. The second (Forre and Mooij) as well as the third (Rantanen and Hyttinen) give you exact results (mathematically proven correct, in a certain sense), but they don't scale well; you won't be able to analyse networks with more than eight or nine nodes. The first (Hyttinen et al.) and the fourth (Mooij and Claassen) scale better (about 50 and sometimes even 100 and more nodes), but are "not as correct" as the other two.

Those algorithms, if you don't provide them with additional interventional data, can only discover parts of the causal graph, not much more beyond the sceleton of the graph (that is just the edges without the arrow heads). For those, there are further possibilities. E.g., if you are willing to presume that your noise is additive, i.e. you have an ANM (Additive Noise Model), you should read the very cool paper:

  • Hoyer, Patrik, et al. "Nonlinear causal discovery with additive noise models." Advances in neural information processing systems 21 (2008).

This paper as well as many newer papers that cite this one give you an idea of how to proceed with directing your edges.

frank
  • 1,434
  • 1
  • 8
  • 13
  • Thanks! At a minimum this adds to the list of resources that I've gotten from other answers, but it also looks like you're answer is the most direct yet. Thanks again. –  Feb 12 '22 at 22:57