0

Whilst looking at the answer to this question: Gaussian Discriminant Analysis and sigmoid function

I was wondering how they got from $$log(\frac{\pi_1}{\pi_0}) -(x - \mu_1)^T\Sigma^{-1}(x - \mu_1)/2 + (x - \mu_0)^T\Sigma^{-1}(x - \mu_0)/2 $$

to

$$ log(\frac{\pi_1}{\pi_0}) + (\mu_1 - \mu_0)^T\Sigma^{-1}x + \frac{1}{2}(\mu_0^T\Sigma^{-1}\mu_0 - \mu_1^T\Sigma^{-1}\mu_1) $$

More specifically - I don't understand how they got this term: $$(\mu_1 - \mu_0)^T\Sigma^{-1}x$$

Any help would be much appreciated!

Karolis Koncevičius
  • 4,282
  • 7
  • 30
  • 47
clostar
  • 3
  • 1

1 Answers1

1

Recall from matrix rules that:

(1) the matrix multiplication operator is distributive with respect to addition, meaning $A(B+C)=AB+AC$

(2) the transpose operator respects addition, meaning $(A+B)^T=A^T+B^T$.

Starting with the leftmost term (neglect the $\frac{1}{2}$ for now): from (1) we get $$(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)=(x-\mu_0)^T\Sigma^{-1}x-(x-\mu_0)^T\Sigma^{-1}\mu_0$$

and from (2): $$(x-\mu_0)^T\Sigma^{-1}x=x^T\Sigma^{-1}x-\mu_0^T\Sigma^{-1}x$$

and so the leftmost term can be written as

$$\frac{1}{2}(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)=\frac{1}{2}\left(x^T\Sigma^{-1}x-\mu_0^T\Sigma^{-1}x-x^T\Sigma^{-1}\mu_0+\mu_0^T\Sigma^{-1}\mu_0\right)$$

adding up both terms: $$-(x-\mu_1)^T\Sigma^{-1}(x-\mu_1)/2+(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)/2= \frac{1}{2}\left( -x^T\Sigma^{-1}x+\mu_1^T\Sigma^{-1}x+x^T\Sigma^{-1}\mu_1-\mu_1^T\Sigma^{-1}\mu_1+x^T\Sigma^{-1}x-\mu_0^T\Sigma^{-1}x-x^T\Sigma^{-1}\mu_0+\mu_0^T\Sigma^{-1}\mu_0 \right)= \frac{1}{2}\left( \mu_1^T\Sigma^{-1}x-\mu_0^T\Sigma^{-1}x+x^T\Sigma^{-1}\mu_1-x^T\Sigma^{-1}\mu_0-\mu_1^T\Sigma^{-1}\mu_1+\mu_0^T\Sigma^{-1}\mu_0 \right)= \frac{1}{2}\left( (\mu_1-\mu_0)^T\Sigma^{-1}x+x^T\Sigma^{-1}(\mu_1-\mu_0)-\mu_1^T\Sigma^{-1}\mu_1+\mu_0^T\Sigma^{-1}\mu_0 \right) $$

Now, note three things: (a) the transpose of a scalar is a scalar itself; (b) $\left((\mu_1-\mu_0)^T\Sigma^{-1}x\right)^T=x^T\Sigma^{-1}(\mu_1-\mu_0)$ and (c) $x^T\Sigma^{-1}(\mu_1-\mu_0)$ is a scalar, so we can add up:

$$\frac{1}{2}\left( (\mu_1-\mu_0)^T\Sigma^{-1}x+x^T\Sigma^{-1}(\mu_1-\mu_0)-\mu_1^T\Sigma^{-1}\mu_1+\mu_0^T\Sigma^{-1}\mu_0 \right)=\frac{1}{2}\left(2(\mu_1-\mu_0)^T\Sigma^{-1}x -\mu_1^T\Sigma^{-1}\mu_1+\mu_0^T\Sigma^{-1}\mu_0 \right)$$ $$=(\mu_1-\mu_0)^T\Sigma^{-1}x+\frac{1}{2}\left(\mu_0^T\Sigma^{-1}\mu_0 -\mu_1^T\Sigma^{-1}\mu_1 \right)$$

$\blacksquare$.

Spätzle
  • 2,331
  • 1
  • 10
  • 25
  • 1
    Thanks! I didn't realise $\left((\mu_1-\mu_0)^T\Sigma^{-1}x\right)^T=x^T\Sigma^{-1}(\mu_1-\mu_0) $ was a scalar! – clostar Aug 16 '21 at 12:45