I'm using R to try and compare the results of variable chemical compositions, following on from an article I've read. In it, the authors used CDA to do something very similar to what I want to do, but I've been told by another researcher (without much of an explanation) that LDA would be better suited. I could go into the specifics of why supervised learning is the avenue chosen, etc. but I won't post that unless someone asks.
After doing some background reading (which hasn't really cleared up the difference between the two), I figured I'd try to explore this myself and compare the results. The primary difference between my data and that in this article is that instead of just using the compositions, I've created 3 new variables (S-, F- and V-) for the CDA that are functions of the original compositional data (see code below).
However, when I run the two analyses I get EXACTLY the same results - identical plots. This doesn't seem possible, but I can't find an error in my coding.
My two questions are:
Is it possible for LDA and CDA to return the exact same result?
What are the practical differences between LDA and CDA?
Data:
library(MASS)
library(candisc)
library(ggplot2)
al2o3<-runif(20,5,10)
sio2<-runif(20,10,30)
feo<-runif(20,40,60)
country<-c(rep("England",6), rep("Scotland",6), rep("Wales",4), rep("France",4))
df<-data.frame(country,al2o3,sio2,feo)
LDA:
lda <- lda(country ~ feo+sio2+al2o3, data=df)
plda <- predict(object = lda, newdata = df)
dataset = data.frame(country = df[,"country"], lda = plda$x)
ggplot(dataset) + geom_point(aes(lda.LD1, lda.LD2, colour = country))
CDA:
fvalue<-(df$also3/df$sio2)
svalue<-((2.39*df$feo)/(df$al2o3+df$sio2))
vvalue<-(df$sio2/df$feo)
mod <- lm(cbind(feo,sio2,al2o3) ~ country, data=df)
can2 <- candiscList(mod)
mod2 <- lm(cbind(fvalue,svalue,vvalue) ~ country, data=df)
can3 <- candiscList(mod2)
ggplot(can2$country$scores, aes(x=Can1,y=Can2)) + geom_point(aes(color=country))