4

I am currently studying discriminant analysis. I have encountered the phrases "likelihood-based LDA" (with some prior) and "likelihood-based QDA" (with some prior). I know what LDA (with some prior) and QDA (with some prior) are, and I have implemented them in R, but I don't understand what "likelihood-based LDA" and "likelihood-based QDA" are, how they relate to LDA and QDA (or whether they are the same concept), and how they are implemented in R (compared to the usual LDA and QDA).

When I implemented LDA and QDA (with uniform priors over 3 classes) in R, I did the following:

lda.0 = lda( x ~ a + b + c + d + e + f, data = data, prior = c(1/3,1/3,1/3) )
preds.0 = predict( lda.0 )$class

xtabs( ~ preds.0 + data$x )

qda.0 = qda( x ~ a + b + c + d + e + f, data = data, prior = c(1/3,1/3,1/3) )
preds.0 = predict( qda.0 )$class

xtabs( ~ preds.0 + data$x )

The lda() and qda() functions are from the MASS package https://www.rdocumentation.org/packages/MASS/versions/7.3-53/topics/lda https://www.rdocumentation.org/packages/MASS/versions/7.3-53/topics/qda

What is "likelihood-based LDA" and "likelihood-based QDA", how do they relate to LDA and QDA, and how they are implemented in R (compared to the implementation above)? Links to good resources would also be appreciated.


This exercise statement is why I'm asking this question:

enter image description here

And here is the provided solution (no code provided):

enter image description here

enter image description here

enter image description here


Here are all of the relevant textbook mentions of "likelihood" for this context:

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

The Pointer
  • 1,064
  • 13
  • 35
  • Where did you get `lda()` and `qda()` from? – Gavin Simpson Jan 27 '21 at 05:49
  • @GavinSimpson It's from the R `MASS` package https://www.rdocumentation.org/packages/MASS/versions/7.3-53/topics/lda https://www.rdocumentation.org/packages/MASS/versions/7.3-53/topics/qda – The Pointer Jan 27 '21 at 12:22

2 Answers2

1

Functions lda() and qda() in R perform classification based on Gaussian likelihood. This means that they assume that the distribution of features (predictors) is multivariate normal in each class. The difference between lda() and qda(): LDA says that the covariance matrix is same in each class and QDA allows the covariance matrix to vary over the classes.

Generally speaking, the philosophy of LDA and QDA does not require the data to be Gaussian. Still, this is how Ronald Fisher developed LDA originally and this is how you get linearity of the boundaries separating the classes.

stans
  • 1,650
  • 6
  • 9
  • Are you saying that "likelihood-based LDA/QDA" and normal LDA/QDA (as per the `lda()` and `qda()` R functions) are the same thing? Or am I misunderstanding? – The Pointer Jan 29 '21 at 05:32
  • I have not encountered term "likelihood-based LDA" in 20+ years of my statistics career. Maybe, some relatively recent author is trying to push this phrase. Seems a bit off to me since, as I have explained, any LDA and QDA is based on likelihood calculations. This is well explained in [*Hastie, Tibshirani & Friedman (2008). The elements of statistical learning.*], among other references. – stans Jan 29 '21 at 05:40
  • Yes, I agree with you. I think seeing the terminology in an example confused me. I will edit my main post with the examples; I would appreciate it if you would please comment on what you think is going on. – The Pointer Jan 29 '21 at 05:42
  • 1
    Without seeing author's whole exposition of LDA and QDA, it is hard to definitively comment on what he/she means. I would not be surprised if he/she just emphasized to the reader that the calculations involved likelihood functions... Is he/she talking about non-likelihood-based LDA or QDA in the book? – stans Jan 29 '21 at 05:52
  • I have added the relevant sections of the text where "likelihood" is mentioned in this context. It seems to me that the author is using "likelihood-based" to describe the *prior* that is based on the *sizes of the classes* ("estimated prior probabilities" in the example), rather than the uniform prior; at least, that's what I infer from the example, where the distinction is made between the "estimated prior probabilities" prior and the uniform prior. – The Pointer Jan 29 '21 at 05:58
  • The likelihood is separate from prior. Still, in the text you have just attached the author is talking about good old LDA, known for 80 years. – stans Jan 29 '21 at 06:03
  • So it seems that they're just referring to the usual LDA/QDA when they say "likelihood-based LDA/QDA"? – The Pointer Jan 29 '21 at 06:06
  • Yes, they are, in the pages you have shown me. I do not know what the rest of the book is. – stans Jan 29 '21 at 06:07
  • Looking at the discrimination rules here https://en.wikipedia.org/wiki/Linear_discriminant_analysis#Discrimination_rules , is what they're describing in the text is the "maximum likelihood" rule? And is this what the `lda()` and `qda()` R functions use by default? – The Pointer Jan 29 '21 at 06:09
  • As I mentioned, lda() and qda() implement good old LDA and QDA, based on Gaussian likelihood functions. – stans Jan 29 '21 at 06:12
  • But is that the same thing as the "maximum likelihood" discrimination rule? – The Pointer Jan 29 '21 at 06:14
  • Again: you are using phrases that somebody somewhere preferred to use. They are not generally accepted definitions of anything but rather authors' attempts to describe concepts better... The references you have shown in your post describe classic LDA and QDA, as they are known to everybody, as there were invented almost a century ago, as they are implemented in lda() and qda(). Have to hop off now. – stans Jan 29 '21 at 06:21
1

These terms likelihood-based LDA and likelihood-based QDA are not special terms for some special case or adaptation of LDA and QDA.

The likelihood-based is just an adjective to describe the LDA and QDA. It creates a pleonasm because LDA and QDA are likelihood-based (just like the algebra-based least squares method, the probability-based maximum posterior estimate, the itterative-least-squares-based generalized linear models, etc.)

LDA and QDA can be seen as based on likelihood when the distribution is multivariate Gaussian. LDA assumes equal covariance for the classes and QDA is without this assumption. These methods estimate the distribution of the classes. Then to create a classifier they compute the line (LDA) or quadratic curve (QDA) that corresponds with the likelihood-based or posterior-based (when you include prior class probabilities) naive Bayesian classifier. This line or curve passes through points where both classes are equally probable. You can see this being demonstrated for QDA in the image from this question: LDA and Fisher LDA - are their weight vectors always equivalent?

example

The reason why you encountered these terms likelihood-based LDA and likelihood-based QDA are not very clear. But based on your example I imagine that it might have been used to accentuate the difference with methods that are not based on likelihood.

In your example there's a question "apply the likelihood-based discriminant approach" which is followed up by an answer which ends up using the terms likelihood-based LDA and likelihood-based QDA.

The contrasting non-likelihood-based classifiers are difficult to imagine. Most estimation is in some way or another related to likelihood or posterior probability. But you could imagine methods that are approximations or heuristic, like the k-nearest neighbors algorithm.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • So would the code in my implementation count as "posterior-based" instead of "likelihood-based", since I used a uniform prior ($1/3$ for all three classes)? – The Pointer Feb 12 '21 at 01:08
  • @ThePointer There [is a difference](https://stats.stackexchange.com/a/355164/) in using the posterior with uniform prior versus using the likelihood function. For instance when computing confidence intervals. But for the computation of the maximum likelihood versus maximum posterior of *discrete* classes there is mathematically no difference and it becomes ambiguous which of the two this maximisation is. So when you perform this maximisation, then whether or not you consider it likelihood-based or posterior-based will depend on interpretation. – Sextus Empiricus Feb 12 '21 at 07:18