6

Consider a multivariate Student's t distribution, with parameters $\nu$ (d.f.), $\mu$ (location) and $\Sigma$ (shape).

Does anyone have a good intuition for the individual components not being statistically independent, even when $\Sigma$ is a diagonal matrix?

Ferdi
  • 4,882
  • 7
  • 42
  • 62
Colin Rowat
  • 183
  • 4
  • 2
    consider writing the t as $Z/\sqrt{V/\nu}$ where $Z$ is multivariate normal and $V$ is univariate chi-squared and independent of $Z$. If $\Sigma$ is diagonal, components of $Z$ are independent, but then you are dividing each component by a common factor. The resulting elliptical distribution - even when its axes are aligned with the variable axes - does not have independent components. – Glen_b Oct 10 '18 at 04:22
  • Thank you @Glen_b. Let me try to check one last step in my intuition: if, rather than dividing your $Z$ by $\sqrt{\frac{V}{\nu}}$, I divided it by $2$, I would still have a rv whose components were statistically independent. Thus, the common factor needs to be stochastic: were it not, then knowing $X_1$ doesn't provide any further information about $X_2$ because I already knew the common factor; with a stochastic factor, knowing $X_1$ does provide further information about $X_2$. Does that make sense? – Colin Rowat Oct 10 '18 at 08:43
  • 1
    Could you elaborate on what "intuition" means? Independence implies the conditional distributions are all the same. However, inspection of the formula for the PDF (in two dimensions $x,y$) readily shows the conditional distributions of $y$ at given values of $x$ have identical shapes but are scaled by $(1+x^2/\nu)^{1/2},$ which is nonzero when $x\ne 0.$ Would that be "intuitive" or not? – whuber Oct 10 '18 at 15:42
  • 1
    Thanks @whuber. Your question is deeper than mine. I'll answer by example, maybe just replacing the word 'intuition' with 'simple': I wanted a simple explanation of what goes 'wrong' with the diagonal matrix intuition that works for the Gaussian - why does squashing the peak and fattening the tails break the intuition that the major axes' alignment with the coordinate axes leads to independence? – Colin Rowat Oct 10 '18 at 16:25
  • @Colin you need the denominator to vary, yes. That common value within the vector but varying across draws of the random variable creates dependence between components. Consider a bivariate case and take vertical slices through the bivariate density (conditional densities) -- they don't all have the same shape. Edit: Ah, ... whuber covers that in his answer. – Glen_b Oct 11 '18 at 00:24

1 Answers1

5

Let's look at the situation. As a point of departure we will first study the bivariate standard Normal distribution. I will do this by plotting vertical slices through its graph: these are given by the functions

$$y\to \phi(x,y)$$

for $x = 0, \pm 1/2, \pm 1, \pm 3/2, \pm 2$ (where $\phi$ is the bivariate density). The reason for doing this is that variables $(X,Y)$ are independent if and only if the conditional distribution $Y\mid X=x$ does not vary with $x.$

Figure 1

As $x$ increases, the density grows smaller and the slices shrink down to the axis in the plots at the left. However, if we normalize these curves (rescaling each one vertically) so that each has unit area, thereby turning them into the conditional densities, they all coincide, as shown at the right. This is how we can tell $Y$ is independent of $X.$

Here is the same situation for a multivariate t distribution with $\nu=1$ degree of freedom (in two variables).

Figure 2

The conditional densities, although centered at $0,$ have different shapes: they vary with $x.$ You can see that they spread out as $|x|$ grows larger. This can (easily) be demonstrated algebraically by examining the formula for the multivariate t density.

Here is a "hand-waving" demonstration that might help us calibrate our intuition.

As Glen_b pointed out in comments, the Multivariate t is the distribution of standard multivariate Normal vector $X=(X_1,X_2,\ldots,X_d)$ divided by an independent positive variable $Z.$ (A multiple of $Z^2$ has a Gamma distribution, but that detail doesn't matter.)

Consider what happens to the preceding conditional distributions as $|X_1/Z|$ increases. When $|X_1/Z|$ is relatively large, it likely got that way through a combination of a large value of $|X_1|$ and a smaller than average value of $Z.$ Because $Z$ was small, and $Z$ simultaneously divides all the components of $X$, the values of $X_2/Z, X_3/Z, \ldots, X_d/Z$ (which obviously are scaled by the relatively large quantity $1/Z$) will thereby be more spread out. The larger $|X_1|$ gets, the larger $Z$ is likely to be and the more the other components are spread.

Because the conditional distributions of the $X_i/Z$ become more spread out as $|X_1/Z|$ increases, the $X_i/Z$ cannot be independent of $X_1/Z.$

whuber
  • 281,159
  • 54
  • 637
  • 1,101