Prediction using SVD and Fisher's linear discriminant

Question

Where can I get an explanation of the procedure used when making a prediction using SVD?

Let me elaborate a bit more. Suppose you have data in a matrix $A$ containing two classes. In particular, you could have $m$ attributes and $n$ objects. Half of the objects belong to the first class and the other half belong to the second class. Using SVD we obtain the matrices $U$, $S$, $V^{*}$. From the factor interpretation, we can regard the rows of matrix $U$ as a different view of the $m$ attributes of matrix $A$ and the columns of $V$ as a different view of the objects of matrix $A$. Therefore, it's possible to separate the objects depending on their class from $V^{*}$. For example, supposing that $V^{*}$ looks like this:

$$ V^{*} = \begin{pmatrix} v_{1,1} & v_{1,2} & \cdots & v_{1,n} \\\ v_{2,1} & v_{2,2} & \cdots & v_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ v_{n,1} & v_{n,2} & \cdots & v_{n,n} \end{pmatrix} $$

Then we can separate the first $n/2$ columns and regard them as an alternative view of objects from the fist class. We can do the same with the last $n/2$ columns as an alternative view of objects from the second class. In practice, I have seen we take the matrix $\Sigma V^{*}$ of the decomposition:

$$ \Sigma V^{*} = \begin{pmatrix} \sigma_{1} v_{1,1} & \sigma_{1}v_{1,2} & \cdots & \sigma_{n} v_{1,n} \\\ \sigma_{2} v_{2,1} & \sigma_{2}v_{2,2} & \cdots & \sigma_{2}v_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ \sigma_{n}v_{n,1} & \sigma_{n}v_{n,2} & \cdots & \sigma_{n}v_{n,n} \end{pmatrix} $$

Now that I think about it, I don't know why.

Now we have enough information to calculate the mean of each class in this rather unusual view of our objects. So you can use Fisher's discriminant analysis to get projections of elements from each class in such a way to minimize the variances within elements of a given class and maximize the means between classes. This is succinctly described by the following equation:

$$\vec{w} = \text{ arg max }_{\vec{w}} \displaystyle \frac{\vec{w}S_{b}\vec{w}}{\vec{w}S_{w}\vec{w}}$$

The solution is found by solving the associated generalized eigenvalue problem. This produces a set of eigenvectors and eigenvalues. Apparently, we need to choose the eigenvector corresponding to the largest eigenvalue. Why? After getting this $\vec{w}$ we have everything we need to start making predictions. For simplicity, consider a new object $\vec{b}$ that has the same kind of attributes as $A$. Supposedly, a prediction starts by multiplying $U^{T}$ and $\vec{b}$:

$$ U^{T}\vec{b} = \begin{pmatrix} u_{1,1} & u_{1,2} & \cdots & u_{1,n} \\\ u_{2,1} & u_{2,2} & \cdots & u_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ u_{m,1} & u_{m,2} & \cdots & u_{m,n} \end{pmatrix} \begin{pmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \end{pmatrix} $$

Here the columns of $U^{T}$ represent attributes and the rows represent hidden factors. Likewise, the vector $\vec{b}$ represents a single object with attributes in the rows. As a result of this multiplication, we have an object described in terms of $m$ factors.

Finally, to predict a class for our new object $\vec{b}$, we multiply $\vec{w}$ by the previous result:

$$w^{T}U^{T}B = \begin{pmatrix} w_{1} & w_{2} & ... & w_{m} \end{pmatrix} \begin{pmatrix} u_{1,1} & u_{1,2} & \cdots & u_{1,n} \\\ u_{2,1} & u_{2,2} & \cdots & u_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ u_{m,1} & u_{m,2} & \cdots & u_{m,n} \end{pmatrix} \begin{pmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \end{pmatrix} $$

Since $\vec{w}$ is the vector that works as a projection to produce a decision threshold, I think it should live in the same space that other points from $V^{*}$. Therefore, its columns should represent factors. The result of $w^{T}U^{T}B$ produces a number. The comparison of this number with the threshold allows to make a decision on the predicted class.

All of this seems rather reasonable. These operations work well and, of course, produce valid operations. However, I want to have a better grasp of this procedure.

Robert, I wasn't attentive enough for find out what's your _question_. But here are two links that are possibly relevant for your interest. With two classes, SVD indeed [can help](http://stats.stackexchange.com/a/13246/3277) compute the discriminant. However the general (any number of classes) [discriminant analysis](http://stats.stackexchange.com/a/48859/3277) math does not use SVD. — ttnphns, Jun 20 '13 at 06:55
The are a couple question on details but the main question is right there in the first line. Unfortunately, I don't think the answers you linked to help in this question. — Robert Smith, Jun 20 '13 at 14:08
I was wondering about this, too. I was suffering for applying QDA & LDA for high dimensional data, and the code in scikit-learn was done in this way, but I couldn't explain this mathematically. May be [this paper](https://pdfs.semanticscholar.org/35f6/b1a0e3b008c6c2ac944dc6b64fe1d2df851a.pdf) could help you. (Duintjer Tebbens, J. and Schlesinger P. (2007) Improving implementation of linear discriminant analysis for the high dimension/small sample size problem. Computational Statistics & Data Analysis, 52, 423–437.) — Onedge Lee, Aug 12 '18 at 01:46

Prediction using SVD and Fisher's linear discriminant

0 Answers0