2

From the definition of multivariate normal distribution, we know that if a k-dimensional random vector X = (X1, X2, ..., Xk) is (multi-variate) normally distributed if every linear combination of its components is normally distributed. This implies that every component of X is also normally distributed.

What about the reverse? What is an example of k-dimensional random vector X where each of X1, X2, ..., Xk are normally distributed but X is not?

If there is a good example in 2-d, that would be even better, because I could plot the probability density in 3-D and visualize to improve my intuition.

Motivation for this question: I have been studying professor Andrew Ng's Machine Learning course where we model a dataset using multi-variate normal distribution to detect anomalies / outliers. So, I am trying to understand when is it "ok" to model the data this way. For the univariate case, I can plot the data, and see if it "generally" follows a bell curve. But it's hard to get an understanding of a multi-dimensional dataset.

Turbo
  • 123
  • 3
  • I wonder if the degenerate case $X=[X_1, X_1]$ is a normal vector or not (it is according to the definition above) because the density doesn't exist. – gunes Jan 07 '21 at 10:57
  • 1
    @gunes Yes, the degenerate case is Normal according to most definitions. If you exclude it, then you have to formulate special cases for many theorems in statistics and probability. As an example of an effective definition, we might declare that any $n$-variate random variable with a quadratic cumulant generating function (the log of the characteristic function) is Normal. – whuber Jan 07 '21 at 13:16

1 Answers1

4

Consider a standard random normal variable $X$ and an independent binary random variable $B$ with values $-1$ or $1$, each with probability $0.5$. Define $Y = B X$.

$Y$ is a standard normal variable since $$ \begin {array}{ccl} P(Y<y) &=& 0.5 P(Y<y |B=1)+0.5 P(Y<y |B=-1)\\ & = & 0.5 \Phi(y) + 0.5(1-\Phi(-y))\\ & = & \Phi(y) \end{array}$$ Where $\Phi$ is the cumulative distribution function of a standard normal variable.

However, $(X,Y)$ is not a normal vector since $X+Y$ has a probability of $0.5$ of being equal to 0 which is impossible for a normal variable.

Pohoua
  • 2,003
  • 2
  • 15