15

Given $n$ random variable $X_i$, with probability distribution $P(X_1,\ldots,X_n)$, the correlation matrix $C_{ij}=E[X_i X_j]-E[X_i]E[X_j]$ is positive semi-definite, i.e. its eigenvalues are positive or zero.

I am interested in the conditions on $P$ that are necessary and/or sufficient for $C$ to have $m$ zero eigenvalues. For instance, a sufficient condition is that the random variables are not independent : $\sum_i u_i X_i=0$ for some real numbers $u_i$. For example, if $P(X_1,\ldots,X_n)=\delta(X_1-X_2)p(X_2,\ldots,X_n)$, then $\vec u=(1,-1,0,\ldots,0)$ is an eigenvector of $C$ with zero eigenvalue. If we have $m$ independent linear constraints on the $X_i$'s of this type, it would imply $m$ zero eigenvalues.

There is at least one additional (but trivial) possibility, when $X_a=E[X_a]$ for some $a$ (i.e. $P(X_1,\ldots,X_n)\propto\delta(X_a-E[X_a])$), since in that case $C_{ij}$ has a column and a line of zeros : $C_{ia}=C_{ai}=0,\,\forall i$. As it is not really interesting, I am assuming that the probability distribution is not of that form.

My question is : are linear constraints the only way to induce zero eigenvalues (if we forbid the trivial exception given above), or can non-linear constraints on the random variables also generate zero eigenvalues of $C$ ?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Adam
  • 153
  • 1
  • 10
  • 1
    By definition, a collection of vectors that includes the zero vector is linearly dependent, so your additional possibility isn't anything new or different. Could you please explain what you mean by "having a $m$ eigenvalue"? That looks like some kind of typographical error. – whuber Sep 12 '18 at 16:45
  • @whuber : yes, typo. Corrected. I think the two conditions are different : one is about the relationship between the variables, while the other is about the probability of only variable (namely $p(X_a)=\delta(X_a-E(X_a))$). – Adam Sep 12 '18 at 20:36
  • The formulation of your question is confusing. It *looks* like an elementary theorem of linear algebra, but the references to "independent" random variables suggest it might be about something else altogether. Would it be correct to understand that every time you use "independent" you mean in the sense of linear independence and not in the sense of (statistically) independent random variables? Your reference to "missing data" is even further confusing, because it suggests your "random variables" might really mean just columns of a data matrix. It would be good to see these meanings clarified. – whuber Sep 12 '18 at 21:34
  • @whuber : I've edited the question. Hopefully it is clearer. – Adam Sep 12 '18 at 21:49
  • The condition for independence $\sum_i u_i X_i=0$ does not necessarily need to be zero (any constant will do), unless the mean of each $X_i$ is zero. – Sextus Empiricus Sep 14 '18 at 22:12

4 Answers4

8

Linear independence is not just sufficient but also a neccesary condition

To show that the variance-covariance matrix has eigenvalues equal to zero if and only if the variables are not linearly independent, it only remains to be shown that "if the matrix has eigenvalues equal to zero then the variables are not linearly independent".

If you have a zero eigenvalue for $C_{ij} = \text{Cov}(X_i,X_j)$ then there is some linear combination (defined by the eigenvector $v$)

$$Y = \sum_{i=1}^n v_i (X_i) $$

such that

$$\begin{array}{rcl} \text{Cov}(Y,Y) &=& \sum_{i=1}^n \sum_{j=1}^n v_i v_j \text{Cov}(X_i,X_j) \\ &=&\sum_{i=1}^n v_i\sum_{j=1}^n v_j C_{ij} \\ &= &\sum_{i=1}^n v_i \cdot 0 \\ &=& 0 \end{array}$$

which means that $Y$ needs to be a constant and thus the variables $X_i$ have to add up to a constant and are either constants themselves (the trivial case) or not linearly independent.

- the first line in the equation with $\text{Cov}(Y,Y)$ is due to the property of covariance $$\scriptsize\text{Cov}(aU+bV,cW+dX) = ac\,\text{Cov}(U,W) + bc\,\text{Cov}(V,W) +ad\, \text{Cov}(U,X) + bd \,\text{Cov}(V,X) $$

- the step from the second to the third line is due to the property of a zero eigenvalue $$\scriptsize \sum_{j=1}^nv_jC_{ij} = 0$$


Non-linear constraints

So, since linear constraints are a necessary condition (not just sufficient), non-linear constraints will only be relevant when they indirectly imply a (necessary) linear constraint.

In fact, there is a direct correspondence between the eigenvectors associated with the zero eigenvalue and the linear constraints.

$$C \cdot v = 0 \iff Y = \sum_{i=1}^n v_i X_i = \text{const}$$

Thus non-linear constraints leading to a zero eigenvalue must, together combined, generate some linear constraint.


How can non-linear constraints lead to linear constraints

Your example in the comments can show this intuitively how non-linear constraints can lead to linear constraints by reversing the derivation. The following non-linear constraints

$$\begin{array}{lcr} a^2+b^2&=&1\\ c^2+d^2&=&1\\ ac + bd &=& 0 \\ ad - bc &=& 1 \end{array}$$

can be reduced to

$$\begin{array}{lcr} a^2+b^2&=&1\\ c^2+d^2&=&1\\ a-d&=&0 \\ b+c &=& 0 \end{array}$$

You could inverse this. Say you have non-linear plus linear constraints, then it is not strange to imagine how we can replace one of the linear constraints with a non-linear constraint, by filling the linear constraints into the non-linear constraints. E.g when we substitute $a=d$ and $b=-c$ in the non-linear form $a^2+b^2=1$ then you can make another relationship $ad-bc=1$. And when you multiply $a=d$ and $c=-b$ then you get $ac=-bd$.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • I guess this (and the answer by whuber) is an indirect answer to my question (which was : "is linear dependence the only way to obtain a zero eigenvalue") in this way : even if the dependence between the random variables is non-linear, it can always be rewritten as a linear dependence by just writing $Y=\sum_i \nu_i X_i$. Although I was really looking for way to characterize the possible non-linear constraints themselves, I guess it is nevertheless a useful result. – Adam Sep 14 '18 at 22:33
  • Yes, I know... what I'm saying is that if there is a non-linear dependence **and** there is a zero eigenvalue, then by your answer, it means that the non-linear dependence can be "factored" in some way into a linear dependence. It is a weaker version of what I was looking for, but still something. – Adam Sep 14 '18 at 22:40
  • Your a giving an example that does not work, which does not mean that it cannot be the case... – Adam Sep 14 '18 at 22:49
  • Here is a counter-example of what your saying (if you think it is not, then it might help us find what is wrong with my formulation of the problem :) ) : Take a 2-by-2 random matrix $M$, with the *non-linear* constraint $M.M^T=1$ and $\det M=1$. These 3 non-linear constraint can be rewritten in terms of 2 linear constraints, and one linear : meaning that the covariance matrix has two 0 eigenvector. Remove the constraint $\det M=1$, and they disappear. – Adam Sep 14 '18 at 22:53
  • $M_{11}=X_1$, $M_{12}=X_2$, $M_{21}=X_3$ and $M_{22}=X_4$. The constraints are $X_1^2+X_2^2=1$, $X_3^2+X_4^2=1$, $X_1 X_3+X_2 X_4=0$ (only two are independent). They do not imply a zero eigenvalue. However, adding $X_1 X_4-X_2 X_3=1$ does imply two eigenvectors with 0 eigenvalues. – Adam Sep 14 '18 at 23:00
  • Anyway, I agree with your sentence "I showed that a necessary condition is that a linear sum of the variables must equal to a constant. Any other dependency can only be a sufficient/necessary condition if/iff it implies linear dependence", since it was what I meant above. – Adam Sep 14 '18 at 23:07
  • The question as written is : "are linear constraints the only way to induce zero eigenvalues (if we forbid the trivial exception given above), or can non-linear constraints on the random variables also generate zero eigenvalues? " in particular, I was wondering if non-linear constraint which cannot be linearized could imply zero eigenvalues. By your answer, it seems not. – Adam Sep 14 '18 at 23:16
  • I'm not sure what you mean by "[in my example,] the non-linear constraints must lead to points". Without the $\det M=1$ constraint, the correlation matrix will generically not have zero eigenvalues, which seems to contradict what you are saying... ? – Adam Sep 17 '18 at 12:07
  • The $\det M$ constraint is **not** implied by the other three. You can write the three constraint as $X_1^2+X_2^2=1$, $X_1^2=X_4^2$ and $X_2^2=X_3^2$. Then $\det M=1$ implies $X_1=X_4$ and $X_2=-X_3$, which gives two linear constraints, and thus two zero eigenvalues. – Adam Sep 17 '18 at 12:21
  • @Adam, you are right. Intuitively I thought that the intersections of multiple non-linear constraints, when they lead to a linear relationship (which represents a manifold with zero curvature) would have to be related to manifolds of zero curvature themselves. However the linear relationship is not necessarily a manifold with zero curvature (or actually the relationship is, but you can transform it, and the resulting space swept out may be non-linear). In the case with the matrix $M$ it is a 4-d curve that can be seen as two 2-d circles with a 90 degrees phase shift. – Sextus Empiricus Sep 17 '18 at 13:34
  • Sure, you can go from linear to non-linear constraint. But the question could be rephrased as : if I give you a set of non-linear constraints, is there a zero eigenvalue. We known now that the answer is : only if the non-linear constraint if they also imply linear ones. – Adam Sep 17 '18 at 17:37
  • I was expanding the question to thinking about how and when nonlinear constraints imply linear constraints. One intuitive way to look at it is to see reverse the path, from linear constraints to non-linear constraints. It is not difficult to see that linear constraints can be easily rephrased into a wide range of non-linear constraints. However, I still wonder whether, when given some set of non-linear constraints, there is a general method to know whether they imply a linear constraint (or at least whether there is a certain kind of non-linear constraints that are neccesary). – Sextus Empiricus Sep 17 '18 at 18:20
  • Yep, that's exactly what I'm looking for :) but it might be too ambitious given that "non-linear constraint" is pretty broad... – Adam Sep 17 '18 at 21:41
6

Perhaps by simplifying the notation we can bring out the essential ideas. It turns out we don't need involve expectations or complicated formulas, because everything is purely algebraic.


The algebraic nature of the mathematical objects

The question concerns relationships between (1) the covariance matrix of a finite set of random variables $X_1, \ldots, X_n$ and (2) linear relations among those variables, considered as vectors.

The vector space in question is the set of all finite-variance random variables (on any given probability space $(\Omega,\mathbb P)$) modulo the subspace of almost surely constant variables, denoted $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R.$ (That is, we consider two random variables $X$ and $Y$ to be the same vector when there is zero chance that $X-Y$ differs from its expectation.) We are dealing only with the finite-dimensional vector space $V$ generated by the $X_i,$ which is what makes this an algebraic problem rather than an analytic one.

What we need to know about variances

$V$ is more than just a vector space: it is a quadratic module, because it comes equipped with the variance. All we need to know about variances are two things:

  1. The variance is a scalar-valued function $Q$ with the property that $Q(aX)=a^2Q(X)$ for all vectors $X.$

  2. The variance is nondegenerate.

The second needs some explanation. $Q$ determines a "dot product," which is a symmetric bilinear form given by

$$X\cdot Y = \frac{1}{4}\left(Q(X+Y) - Q(X-Y)\right).$$

(This is of course nothing other than the covariance of the variables $X$ and $Y.$) Vectors $X$ and $Y$ are orthogonal when their dot product is $0.$ The orthogonal complement of any set of vectors $\mathcal A \subset V$ consists of all vectors orthogonal to every element of $\mathcal A,$ written

$$\mathcal{A}^0 = \{v\in V\mid a . v = 0\text{ for all }v \in V\}.$$

It is clearly a vector space. When $V^0 = \{0\}$, $Q$ is nondegenerate.

Allow me to prove that the variance is indeed nondegenerate, even though it might seem obvious. Suppose $X$ is a nonzero element of $V^0.$ This means $X\cdot Y = 0$ for all $Y\in V;$ equivalently,

$$Q(X+Y) = Q(X-Y)$$

for all vectors $Y.$ Taking $Y=X$ gives

$$4 Q(X) = Q(2X) = Q(X+X) = Q(X-X) = Q(0) = 0$$

and thus $Q(X)=0.$ However, we know (using Chebyshev's Inequality, perhaps) that the only random variables with zero variance are almost surely constant, which identifies them with the zero vector in $V,$ QED.

Interpreting the questions

Returning to the questions, in the preceding notation the covariance matrix of the random variables is just a regular array of all their dot products,

$$T = (X_i\cdot X_j).$$

There is a good way to think about $T$: it defines a linear transformation on $\mathbb{R}^n$ in the usual way, by sending any vector $x=(x_1, \ldots, x_n)\in\mathbb{R}^n$ into the vector $T(x)=y=(y_1, \ldots, x_n)$ whose $i^\text{th}$ component is given by the matrix multiplication rule

$$y_i = \sum_{j=1}^n (X_i\cdot X_j)x_j.$$

The kernel of this linear transformation is the subspace it sends to zero:

$$\operatorname{Ker}(T) = \{x\in \mathbb{R}^n\mid T(x)=0\}.$$

The foregoing equation implies that when $x\in \operatorname{Ker}(T),$ for every $i$

$$0 = y_i = \sum_{j=1}^n (X_i\cdot X_j)x_j = X_i \cdot \left(\sum_j x_j X_j\right).$$

Since this is true for every $i,$ it holds for all vectors spanned by the $X_i$: namely, $V$ itself. Consequently, when $x\in\operatorname{Ker}(T),$ the vector given by $\sum_j x_j X_j$ lies in $V^0.$ Because the variance is nondegenerate, this means $\sum_j x_j X_j = 0.$ That is, $x$ describes a linear dependency among the $n$ original random variables.

You can readily check that this chain of reasoning is reversible:

Linear dependencies among the $X_j$ as vectors are in one-to-one correspondence with elements of the kernel of $T.$

(Remember, this statement still considers the $X_j$ as defined up to a constant shift in location--that is, as elements of $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R$--rather than as just random variables.)

Finally, by definition, an eigenvalue of $T$ is any scalar $\lambda$ for which there exists a nonzero vector $x$ with $T(x) = \lambda x.$ When $\lambda=0$ is an eigenvalue, the space of associated eigenvectors is (obviously) the kernel of $T.$


Summary

We have arrived at the answer to the questions: the set of linear dependencies of the random variables, qua elements of $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R,$ corresponds one-to-one with the kernel of their covariance matrix $T.$ This is so because the variance is a nondegenerate quadratic form. The kernel also is the eigenspace associated with the zero eigenvalue (or just the zero subspace when there is no zero eigenvalue).


Reference

I have largely adopted the notation and some of the language of Chapter IV in

Jean-Pierre Serre, A Course In Arithmetic. Springer-Verlag 1973.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Whoa, that's great ! Just a question to be sure that I understand everything : when you write "$X_j$ as vectors" you do not mean collecting the random variables in a vector (i.e. $\vec X=(X_1,\ldots,X_n)$), or do you ? If I'm right, I'm guessing that you are collecting the possible values of the random variable $X_i$ into a vector, while the probability distribution is hidden into the definition of the variance, right ? – Adam Sep 14 '18 at 20:53
  • I think the main aspect that is not quite clear is the following (which might just show my lack of formal knowledge of probability theory) : you seem to show that if there is a 0 eigenvalue, then we have e.g. $X_1=X_2$. This constraint does not refer to the probability distribution $P$, which is hidden in $Q$ (I think this is the clever point about this demonstration). But what does that mean to have $X_1=X_2$ without reference to $P$? Or does it just imply that $P\propto \delta(X_1-X_2)$, but then how do we know that it must be a *linear combination of $X_1$ and $X_2$ in the delta function*? – Adam Sep 14 '18 at 21:05
  • I'm afraid I don't understand your use of a "delta function" in this context, Adam. That is partly because I see no need for it and partly because the notation is ambiguous: would that be a Kronecker delta or a Dirac delta, for instance? – whuber Sep 15 '18 at 12:40
  • It would be a Kronecker or a Dirac depending on the variables (discrete or continuous). These delta's could be part of the integration measure, e.g. I integrate over 2-by-2 matrices $M$ (so four real variables $X_1$, $X_2$, $X_3$ and $X_4$, with some weight (say $P=\exp(-tr(M.M^T))$), or I integrate over a sub-group. If it is symmetric matrices (implying for instance $X_2=X_3$), I can formally impose that by multiplying $P$ by $\delta(X_1-X_2)$. This would be a linear constraint. An example of non-linear constraint is given in the comments below Martijn Weterings's answer. – Adam Sep 15 '18 at 17:29
  • (continued) The question is : what can of non-linear constraints that I can add on my variables can induce a 0 eigenvalue. By your answers, it seems to be : only non-linear constraint that imply linear constraint (as exemplified in the comments below Martijn Weterings's answer). Maybe the problem is that my way of thinking of the problem is from a physicist point of view, and I struggle to explain it in a different language (I think here is the right place to ask this question, no physics.SE). – Adam Sep 15 '18 at 17:31
  • do you have a good textbook reference discussing the covariance as a scalar product (I've found some notes on the internet, but which could not be used for citations). Also, does the space $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R$ have a name (so I can look for it) ? – Adam Sep 19 '18 at 11:29
  • Christensen's [Plane Answers to Complex Questions](https://www.springer.com/us/book/9781441929716) is a well-known book. I don't know of a standard name for the space: one might call it "square integrable random variables" up to location (or modulo translation). – whuber Sep 19 '18 at 14:43
4

Suppose $C$ has an eigenvector $v$ with corresponding eigenvalue $0$, then $\operatorname{var}(v^T X) = v^T Cv = 0$. Thus, by Chebyshev's inequality, $v^TX$ is almost surely constant and equal to $v^T E [X]$. That is, every zero eigenvalue corresponds to a linear restriction, namely $v^T X = v^T E[X]$. There is no need to consider any special cases.

Thus, we conclude:

"are linear constraints the only way to induce zero eigenvalues [?]"

Yes.

"can non-linear constraints on the random variables also generate zero eigenvalues of C ?"

Yes, if they imply linear constraints.

ekvall
  • 4,361
  • 1
  • 15
  • 37
  • I agree. I was hoping that one could be more specific on the kind of non-linear constraints, but I guess that it is hard to do better if we do not specify the constraints. – Adam Sep 17 '18 at 14:34
3

The covariance marix $C$ of $X$ is symmetric so you can diagnonalize it as $C=Q\Lambda Q^T$, with the eigenvalues in the diagonal matrix $\Lambda.$ Rewriting this as $\Lambda=Q^TCQ$, the rhs is the covariance matrix of $Q^TX$, so zero eigenvalues on the lhs correspond to linear combinations of $X$ with degenerate distributions.

Hasse1987
  • 516
  • 2
  • 11