I am trying to understand how the coefficients of linear discriminants are calculated in lda()
.
Consider the following data set.
library(MASS)
S<-matrix(c(2,.5,.5,1),2,2)
set.seed(1)
X<-data.frame(rbind(mvrnorm(25,c(0,0),S),mvrnorm(25,c(3,2),S)),Class=c(rep("First",25),rep("Second",25)))
lda.fit<-lda(Class~X1+X2,data=X)
lda.fit
contains the following data.
Call:
lda(Class ~ X1 + X2, data = X)
Prior probabilities of groups:
First Second
0.5 0.5
Group means:
X1 X2
First -0.2205177 -0.1224064
Second 2.7965638 1.8489960
Coefficients of linear discriminants:
LD1
X1 0.3476010
X2 0.7330707
It seems that the vector of coefficients should be calculated using the formula $$ \bf w\propto{\bf S}_W^{-1}({\bf m}_2-{\bf m}_1), $$ where ${\bf S}_W^{-1}$ is the inverse of the pooled covariance matrix, ${\bf m}_2$ and ${\bf m}_1$ are the sample means of the groups (the formula comes from page 189 of Pattern Recognition and Machine Learning by Christopher M. Bishop).
Sh<-((25-1)*cov(X[1:25,1:2])+(25-1)*cov(X[26:50,1:2]))/(50-2)
w<-solve(Sh)%*%(lda.fit$means[2,]-lda.fit$means[1,])
w
is equal to
[,1]
X1 0.8668882
X2 1.8282180
and this does not coincide with the results in lda.fit
. However, both of these vectors (w
and coef(lda.fit)
) have the same direction. w
is a scaled version of coef(lda.fit)
and vice versa.
Could someone explain how the coefficients of linear discriminants are calculated? How is the scaling factor chosen for coef(lda.fit)
?
Any help is much appreciated!