At one point in the process of applying linear discriminant analysis (LDA), one has to find the vector $v$ that maximizes the ratio $vBv'/vWv'$, where $B$ is the "between-class scatter" matrix, and $W$ is the "within-class scatter" matrix.
We are given the following: $k$ sets of $N_{i}$ ($i=1,...,k$) vectors $\mathbf{x}_{ij}$ ($i=1,...,k$; $j=1,...,N_{i}$) from $k$ classes. The class sample means are $\mathbf{\bar{x}}_{i}=\frac{1}{N_{i}}\sum_{j=1}^{N_{i}}\mathbf{x}_{ij}$.
All sources I have looked at define $W$ as follows: $$W = \sum_{i=1}^{k}\sum_{j=1}^{N_{i}}(\mathbf{x}_{ij}-\mathbf{\bar{x}}_{i})(\mathbf{x}_{ij}-\mathbf{\bar{x}}_{i})^{T}$$
However, I have seen two different definitions for $B$. The first one, as described in Hardle et al., Applied Multivariate Statistical Analysis, 2003; Neil H. Timm, Applied Multivariate Analysis, 2002; and others, is: $$B = \sum_{i=1}^{k}N_{i}(\mathbf{\bar{x}}_{i}-\mathbf{\bar{x}})(\mathbf{\bar{x}}_{i}-\mathbf{\bar{x}})^{T}$$
Here, $\mathbf{\bar{x}}$ is the overall mean: $$\mathbf{\bar{x}}=\frac{1}{N}\sum_{i=1}^{k}\sum_{j=1}^{N_{i}}\mathbf{x}_{ij}=\frac{1}{N}\sum_{i=1}^{k} N_{i}\mathbf{\bar{x}}_{i},$$ with $N=\sum_{i=1}^{N}N_{i}.$
The second one, as described in: Richard A. Johnson, Dean W. Wichern, Applied Multivariate Statistical Analysis 6th Edition, 2007; the Wikipedia article on LDA; the Scholarpedia article; and others, is: $$B^{*} = \sum_{i=1}^{k}(\mathbf{\bar{x}}_{i}-\mathbf{\bar{x}^{*}})(\mathbf{\bar{x}}_{i}-\mathbf{\bar{x}^{*}})^{T}$$ This time, $\mathbf{\bar{x}^{*}}$ is the mean of the means of the classes: $$\mathbf{\bar{x}^{*}} = \frac{1}{k}\sum_{i=1}^{k} \mathbf{\bar{x}}_{i}$$
I have worked out that both versions of $B$ are formulas for sample variance ($B^{*}$ is standard; for $B$, see wikipedia on weighted covariance). Now, I wonder:
Does anyone know the reason for the discrepancy between the formulas?
Which formula is "better"?
The two formulas should be "equivalent" in some sense; but in what sense precisely?