I am confused about the fundamental definition of a sufficient statistic. I found two different definitions and I wounder if they are equal.
with
- data $X$
- sufficient statistics $t$
- parameter $\theta$
Definition 1, e.g. from Wikipedia: "A statistic $t = T(X)$ is sufficient for underlying parameter $\theta$ precisely if the conditional probability distribution of the data $X$, given the statistic $t = T(X)$, does not depend on the parameter $\theta$.", i.e. as formula: $$ p(X \mid t) = p(X \mid t, \theta) $$
Definition 2, e.g. from a video by Ben Lambert: $$ p(\theta \mid t) = p(\theta \mid X) $$
- The second definition should be equivalent to $p(\theta \mid t, X) = p(\theta \mid t)$: from $t=T(X)$ we can compute $t$ from $X$, so $p(\theta\mid X) = p(\theta \mid X, t)$. Is this correct? If yes then in the two definitions the $X$ and $\theta$ are exchanged.
So, my question is: Are the two definitions equivalent (in general or under some assumptions)? Do I overlook something?
P.S.: The second definition makes more sense if I look at the estimation of $\theta$ (estimator $\hat \theta$) from the data: $X \rightarrow \hat \theta$ and with the sufficient statistics $X \rightarrow t \rightarrow \hat \theta$. The arrows are not causal, e.g., the arrows do not reflect the data generating process which is probably $\theta \rightarrow X$, but not intermediated by a sufficient statistic(?).
The first definition also makes sense: $t$ captures all the information about $\theta$ for $X$.