11

On page 9 of Linear Regression Analysis 2nd Edition of Seber and Lee there is a proof for the expected value of a quadratic form that I don't understand.

Let $X = (X_i)$ be an $n \times 1 $ random vector and let $A$ be an $ n \times n$ symmetric matrix. If $\mathbb{E}[X] = \mu$ and $\operatorname{Var}[X] = \Sigma = (\sigma_{ij})$ then $\mathbb{E}[X^T AX] = \operatorname{tr}(A\Sigma) + \mu^T A\mu$

The problem I have is almost right out the gate, I can't see how $\mathbb{E}[X^T AX] = \operatorname{tr}(\mathbb{E}[X^T AX])$ I think I get the rest of the proof, but if someone here can't point me in the right direction on this part, I'd be forever grateful!

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
Kyle
  • 111
  • 1
  • 1
  • 6
  • It is **not** $\mathbb{E}[X^{T}AX] = tr(\mathbb{E}[X^{T}AX]$ that you need to be concerned about but the trace of $A\Sigma$ which _is_ a $n\times n$ matrix. See my answer for a complete derivation. Jonathan Christensen's remarks about $\text{tr}([3])$ and scalars versus matrices are not applicable to this problem. – Dilip Sarwate Jan 19 '13 at 03:19
  • @DilipSarwate, you are obviously unfamiliar with the matrix algebra version of this proof ([Wikipedia](http://en.wikipedia.org/wiki/Quadratic_form_(statistics)#Derivation) has a very short outline), which is what Kyle is asking about here. You present an alternative proof, but your answer doesn't address the original question at all. – Jonathan Christensen Jan 19 '13 at 17:42
  • @JonathanChristensen The matrix algebra version of the proof needs knowledge such as $E$ and $\text{tr}$ are commutative linear operators, and is a bad example to learn from. The _expectation_ of a matrix $B$ (with random variables as entries) is denoted $E[B]$ and is simply the _matrix_ of expected values. _In general_, the result $E[B]= \text{tr}(E[B])$ is false since the left side is a matrix and the right side a scalar or $1\times 1$ matrix if you will. And the result holds exactly when $B$ is a $1\times 1$ matrix in which case the trace operation on the right is an identity map. – Dilip Sarwate Jan 19 '13 at 20:19
  • @DilipSarwate Whatever you think of the pedagogical merits of the matrix algebra version of the prove, that is the question that Kyle asked. I answered it. You ignored it and made snide comments. – Jonathan Christensen Jan 19 '13 at 22:33
  • Did you like the book? I would like to learn more about the algebra of random matrices/vectors. – Joe Oct 07 '20 at 18:28

3 Answers3

20

As Jonathan Christensen points out, $X^TAX$ is a $1\times 1$ matrix; in fact, it is the (univariate) random variable $$X^TAX = \sum_{i=1}^n \sum_{j=1}^n a_{i,j}X_iX_j.$$ So what is its expectation? Clearly we have $$\begin{align*} E[X^TAX] &= E\left[\sum_{i=1}^n \sum_{j=1}^n a_{i,j}X_iX_j\right]\\ &= \sum_{i=1}^n \sum_{j=1}^n a_{i,j}E[X_iX_j] & \text{by linearity of expectation}\\ &= \sum_{i=1}^n \sum_{j=1}^n a_{i,j}(\sigma_{i,j}+\mu_i\mu_j) &\text{apply covariance formula}\\ &= \sum_{i=1}^n \sum_{j=1}^n a_{i,j}\sigma_{j,i} +\sum_{i=1}^n \sum_{j=1}^n a_{i,j}\mu_i\mu_j &\text{since}~\Sigma~\text{is a symmetric matrix}\\ &= \sum_{i=1}^n [A\Sigma]_{i,i} + \mu^TA\mu\\ &= \text{tr}(A\Sigma) + \mu^TA\mu \end{align*}$$

Dilip Sarwate
  • 41,202
  • 4
  • 94
  • 200
10

Since $X$ is an $n\times1$ vector, $\mathbb E[X^{T}AX]$ is a $1\times1$ matrix. The trace is the sum of diagonal entries, but $\mathbb E[X^TAX]$ only has one entry, so its trace is simply equal to that one entry. If we consider a $1\times1$ matrix to be equivalent to a scalar, then the equality you're worried about follows.

Basically, what's $tr([3])$? It's obviously 3. Now, you might argue that strictly speaking $[3] \neq 3$, because one is a matrix and the other is a real number, but they're basically equivalent, and if we're a bit loose with notation then we can say they're equal.

Jonathan Christensen
  • 3,989
  • 19
  • 25
  • Haha, Damn... I knew that the quadratic form was a scalar, but it seemed so weird to take the trace of a scalar that I thought that I missed something. Thanks a bunch. – Kyle Jan 19 '13 at 01:49
  • I am not sure what this answer means. The _trace_ that is being computed is the trace of $A\Sigma$ which is _not_ a scalar. – Dilip Sarwate Jan 19 '13 at 03:04
  • 2
    @DilipSarwate You didn't read the question carefully, did you? Quote: "I can't see how $\mathbb{E}[X^{T}AX] = tr(\mathbb{E}[X^{T}AX])$. I think I get the rest of the proof [...]" The part Kyle was confused about was exactly the part I addressed. – Jonathan Christensen Jan 19 '13 at 14:40
  • 2
    The final bit of manipulation that Jonathan might want to mention is that ${\rm tr}AB = {\rm tr}BA$ even though these matrices may be of different size. So ${\rm tr}X'AX = {\rm tr} AXX'$ at which point the constant $A$ can be pulled through the expectation sign. – StasK May 31 '13 at 14:20
  • 4
    @StasK: You have identified the entire point of taking the trace in the first place. With Jonathan's explanation, this is a full and complete answer (despite some contrary claims in comments to the question). – whuber May 31 '13 at 14:28
0

To get the intuition behind it, recall that when $X$ were a univariate random variable it follows that \begin{equation} E[a^{2}X^{2}]=a^{2}Var(X)+E[aX]^{2} \end{equation}

On the other hand, when $X$ is a random vector, let $A = C^\prime C$, such that $X^\prime AX= X^{\prime}C^{\prime}CX$, where $C$ denotes the square root of $A$. This assumes that $A$ is positive definite - this assumption, however, is imposed for the sake of illustration. It follows that \begin{equation} E[X^{\prime}C^{\prime}CX]=Var(CX)+E[CX]^{\prime}E[CX] \end{equation} and thus \begin{equation} E[X^{\prime}AX]=C^{\prime}Var(X)C+E[X]^{\prime}AE[X] \end{equation}

Finally, one can represent $C^{\prime} \Sigma C$ as $tr(A\Sigma)$, since $A=C^\prime C$. Obviously, the general property does not require knowing the square root of $A$. However, one can see where this property comes from.