Questions tagged [discriminant-analysis]

Linear Discriminant Analysis (LDA) is a dimensionality reduction and classification method. It finds low-dimensional subspace with the strongest class separation and uses it to perform classification. Use this tag for quadratic DA (QDA) too.

Given multivariate data split into several subsamples (classes), LDA finds linear combinations of variables, called discriminant functions, which discriminate between classes and are uncorrelated. The functions are then applied to assign old or new observations to the classes. Discriminant analysis is both a dimensionality reduction and a classification technique.

Suppose we are given a multivariate dataset split into $K$ classes. The objective is to find the posterior distribution, $P(Y=k|X=x)$, of a data point belonging to class $k$. Let $f_{k}(x)$ be the class-conditional density of $X$ in class $k$ and let $\pi_k$ be the prior probability of being in class $k$. By Bayes rule we have:

$$P(Y=k|X=x) = \frac{f_{k}(x)\pi_k}{\sum_{i=1}^K f_{i}(x)\pi_i}$$

LDA makes the following assumptions:

  1. $f_{k}(x)$ follows a Gaussian density with mean $\mu_k$ and covariance $\Sigma_k$
  2. $\Sigma_k = \Sigma$ for all $k$

The last assumption of constant covariance is what makes this a linear discriminant. The linearity in $x$ can be derived by finding the log-ratio of the posterior probabilities of belonging to a certain class:

$$\log \big( \frac{P(Y=k|X=x}{P(Y=l|X=x} \big) = \log\frac{\pi_k}{\pi_l} - \frac12(\mu_k +\mu_l)^T\Sigma^{-1}(\mu_k - \mu_l) + x^T\Sigma^{-1}(\mu_k - \mu_l)$$

If we don't use a constant covariance, the discriminant function becomes quadratic in $x$, leading to Quadratic Discriminant Analysis, QDA.

467 questions
47
votes
3 answers

Logistic regression vs. LDA as two-class classifiers

I am trying to wrap my head around the statistical difference between Linear discriminant analysis and Logistic regression. Is my understanding right that, for a two class classification problem, LDA predicts two normal density functions (one for…
user1885116
  • 2,128
  • 3
  • 23
  • 26
32
votes
2 answers

Three versions of discriminant analysis: differences and how to use them

Can anybody explain differences and give specific examples how to use these three analyses? LDA - Linear Discriminant Analysis FDA - Fisher's Discriminant Analysis QDA - Quadratic Discriminant Analysis I searched everywhere, but couldn't find real…
Andrius
  • 457
  • 1
  • 9
  • 11
32
votes
2 answers

Does it make sense to combine PCA and LDA?

Assume I have a dataset for a supervised statistical classification task, e.g., via a Bayes' classifier. This dataset consists of 20 features and I want to boil it down to 2 features via dimensionality reduction techniques such as Principal…
user39663
30
votes
1 answer

PCA, LDA, CCA, and PLS

How are PCA, LDA, CCA, and PLS related? They all seem "spectral" and linear algebraic and very well understood (say 50+ years of theory built around them). They are used for very different things (PCA for dimensionality reduction, LDA for…
29
votes
4 answers

What is the relationship between regression and linear discriminant analysis (LDA)?

Is there a relationship between regression and linear discriminant analysis (LDA)? What are their similarities and differences? Does it make any difference if there are two classes or more than two classes?
26
votes
1 answer

How LDA, a classification technique, also serves as dimensionality reduction technique like PCA

In this article , the author links linear discriminant analysis (LDA) to principal component analysis (PCA). With my limited knowledge, I am not able to follow how LDA can be somewhat similar to PCA. I have always thought that LDA was a form of…
26
votes
2 answers

Why is Python's scikit-learn LDA not working correctly and how does it compute LDA via SVD?

I was using the Linear Discriminant Analysis (LDA) from the scikit-learn machine learning library (Python) for dimensionality reduction and was a little bit curious about the results. I am wondering now what the LDA in scikit-learn is doing so that…
22
votes
2 answers

How does linear discriminant analysis reduce the dimensions?

There are words from "The Elements of Statistical Learning" on page 91: The K centroids in p-dimensional input space span at most K-1 dimensional subspace, and if p is much larger than K, this will be a considerable drop in dimension. I have…
jerry_sjtu
  • 435
  • 4
  • 10
20
votes
2 answers

Compute and graph the LDA decision boundary

I saw an LDA (linear discriminant analysis) plot with decision boundaries from The Elements of Statistical Learning: I understand that data are projected onto a lower-dimensional subspace. However, I would like to know how we get the decision…
mynameisJEFF
  • 1,583
  • 4
  • 24
  • 29
20
votes
3 answers

Collinear variables in Multiclass LDA training

I'm training a Multi-class LDA classifier with 8 classes of data. While performing training, I get a warning of: "Variables are collinear" I'm getting a training accuracy of over 90%. I'm using scikits-learn library in Python do train and test the…
19
votes
3 answers

What are "coefficients of linear discriminants" in LDA?

In R, I use lda function from library MASS to do classification. As I understand LDA, input $x$ will be assigned label $y$, which maximize $p(y|x)$, right? But when I fit the model, in which $$x=(Lag1,Lag2)$$$$y=Direction,$$ I don't quite understand…
avocado
  • 3,045
  • 5
  • 32
  • 45
19
votes
1 answer

How is MANOVA related to LDA?

In several places I saw a claim that MANOVA is like ANOVA plus linear discriminant analysis (LDA), but it was always made in a hand-waving sort of way. I would like to know what exactly it is supposed to mean. I found various textbooks describing…
amoeba
  • 93,463
  • 28
  • 275
  • 317
18
votes
2 answers

Can we use categorical independent variable in discriminant analysis?

In discriminant analysis, the dependent variable is categorical, but can I use a categorical variable (e.g residential status: rural, urban) along with some other continuous variable as independent variable in linear discriminant analysis?
16
votes
1 answer

Deriving total (within class + between class) scatter matrix

I was fiddling with PCA and LDA methods and I am stuck at a point, I have a feeling that it is so simple that I can't see it. Within-class ($S_W$) and between-class ($S_B$) scatter matrices are defined as: $$ S_W = \sum_{i=1}^C\sum_{t=1}^N(x_t^i -…
nimcap
  • 413
  • 4
  • 8
16
votes
1 answer

Supervised dimensionality reduction

I have a data set consisting of 15K labeled samples (of 10 groups). I want to apply dimensionality reduction into 2 dimensions, that would take into consideration the knowledge of the labels. When I use "standard" unsupervised dimensionality…
1
2 3
31 32