2

I've been going over many material in classification algorithms, and it seems that under the constraint that the covariance matrices are the same for a two-class problem then classifying a vector $x$ to $C_1$ or $C_2$ will depend on:

$(\mu_1-\mu_2)^T|\sum|^{-1}x-(1/2)(\mu_1-\mu_2)^T|\sum|^{-1}(\mu_1-\mu_2)>Ln(\frac{p(C_2)}{p(C_1)})$

where $\mu_i$ and $\sum_i$ are the mean and covariance matrices of $C_i$. Note that given the assumption $\sum_1 = \sum_2 =\sum$. My first question is under what rule should I calculate $\sum$? In some texts I see that standard practice is to perform a weighted average on $\sum_1$ and $\sum_2$. In other cases, I just see that $\sum$ might as well be the identity matrix $I$, given that all the features in the vector are independent.

My second question is that the limit for calculating the decision boundary for the Bayesian classifier with the previous equation seems straightforward. However, it does not look so clear for Fisher's LDA (unless I'm missing something).

The Fisher LDA constraint wants us to spread interclass means and reduce variance, and this leads us to optimize $J(w)$, where:

$J(w) = \frac{w^T S_B(w) w}{w^T S_W(w) w}$

leading us to $w \propto |S_w|^{-1}(\mu_1-\mu_2)$, for the classification rule:

$y=w^T x + w_0$, where if $y\geq 0, x\rightarrow C_1$, and $x\in C_2$ otherwise.

Thus, how do I calculate $w_0$? I can try to find it through a "Bayesian = LDA" equivalence by making $w = k S_w^{-1} (\mu_1-\mu_2)$:

$w_0 = (k/2)(\mu_1-\mu_2)^T|\sum|^{-1}(\mu_2-\mu_1)$

but now $w_0$ depends on my auxiliary constant $k$, or am I supposed to go with manually fine tuning $w_0$ for best performance on my validation data?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Mecasickle
  • 292
  • 1
  • 12
  • 1
    Regarding Q2: If two class covariances are assumed to be equal, then the midpoint on the line connecting two class means, i.e. $(\mu_1 + \mu_2)/2$ should lie on the decision boundary. This should uniquely specify $w_0$, shouldn't it? Your $k$ is of no relevance here. – amoeba Jul 23 '15 at 18:18
  • Will this still be the case if both distributions have different priors? – Mecasickle Jul 23 '15 at 20:27
  • I don't think vanilla Fisher's LDA can deal with unequal priors, so if you need them, it's probably simpler to work directly with the Bayesian formulation. – amoeba Jul 23 '15 at 20:32
  • 1
    [This answer](http://stats.stackexchange.com/a/31384/3277) could be at least partly helpful. – ttnphns Jul 23 '15 at 23:41

0 Answers0