I have a sample of a multivariate distribution, and I am interested in obtaining a sample from the marginals. I know the right way to do so is by simply taking the corresponding entries. What is the justification for this strategy? Is there a, probably simple, formula or result that justifies this procedure?
1 Answers
I feel that this is essentially the definition of (sampling from) joint and marginal distribution so any result would be unnecessary. In this answer, I'll however try to formalize. The question (considering the bivariate real-valued case) is that if $(X_i,Y_i)$ has the same distribution as $(X,Y)$, do $X_i$ and $X$ have the same distribution?
Consider the bivariate case. If the sample $X_i,Y_i$ has the same joint distribution as $X,Y$, then \begin{equation} P(X\in \mathcal{X}, Y\in \mathcal{Y}) = P(X_i\in \mathcal{X}, Y_i \in \mathcal{Y}) \end{equation} for any Borel $(\mathcal{X},\mathcal{Y})$. But for any $\mathcal{X}$ we may take $\mathcal{Y}=\mathbb{R}$, so \begin{equation} P(X \in \mathcal{X}) = P(X\in \mathcal{X}, Y\in \mathbb{R}) = P(X_i\in \mathcal{X}, Y_i \in \mathbb{R}) = P(X_i \in \mathcal{X}), \end{equation} so $X$ and $X_i$ have the same distribution.
Similar reasoning goes to show that if the pairs $(X_1,Y_1),(X_2,Y_2),\ldots$ are independent, also $X_1,X_2,\ldots$ are independent. So, dropping $Y_i$s from independent pairs $(X_i,Y_i)$ that have the same distribution as $(X,Y)$ produces independent $X_i$s that have the same distribution as $X$.

- 7,463
- 4
- 27
- 46