Where can I get an explanation of the procedure used when making a prediction using SVD?
Let me elaborate a bit more. Suppose you have data in a matrix $A$ containing two classes. In particular, you could have $m$ attributes and $n$ objects. Half of the objects belong to the first class and the other half belong to the second class. Using SVD we obtain the matrices $U$, $S$, $V^{*}$. From the factor interpretation, we can regard the rows of matrix $U$ as a different view of the $m$ attributes of matrix $A$ and the columns of $V$ as a different view of the objects of matrix $A$. Therefore, it's possible to separate the objects depending on their class from $V^{*}$. For example, supposing that $V^{*}$ looks like this:
$$ V^{*} = \begin{pmatrix} v_{1,1} & v_{1,2} & \cdots & v_{1,n} \\\ v_{2,1} & v_{2,2} & \cdots & v_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ v_{n,1} & v_{n,2} & \cdots & v_{n,n} \end{pmatrix} $$
Then we can separate the first $n/2$ columns and regard them as an alternative view of objects from the fist class. We can do the same with the last $n/2$ columns as an alternative view of objects from the second class. In practice, I have seen we take the matrix $\Sigma V^{*}$ of the decomposition:
$$ \Sigma V^{*} = \begin{pmatrix} \sigma_{1} v_{1,1} & \sigma_{1}v_{1,2} & \cdots & \sigma_{n} v_{1,n} \\\ \sigma_{2} v_{2,1} & \sigma_{2}v_{2,2} & \cdots & \sigma_{2}v_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ \sigma_{n}v_{n,1} & \sigma_{n}v_{n,2} & \cdots & \sigma_{n}v_{n,n} \end{pmatrix} $$
Now that I think about it, I don't know why.
Now we have enough information to calculate the mean of each class in this rather unusual view of our objects. So you can use Fisher's discriminant analysis to get projections of elements from each class in such a way to minimize the variances within elements of a given class and maximize the means between classes. This is succinctly described by the following equation:
$$\vec{w} = \text{ arg max }_{\vec{w}} \displaystyle \frac{\vec{w}S_{b}\vec{w}}{\vec{w}S_{w}\vec{w}}$$
The solution is found by solving the associated generalized eigenvalue problem. This produces a set of eigenvectors and eigenvalues. Apparently, we need to choose the eigenvector corresponding to the largest eigenvalue. Why? After getting this $\vec{w}$ we have everything we need to start making predictions. For simplicity, consider a new object $\vec{b}$ that has the same kind of attributes as $A$. Supposedly, a prediction starts by multiplying $U^{T}$ and $\vec{b}$:
$$ U^{T}\vec{b} = \begin{pmatrix} u_{1,1} & u_{1,2} & \cdots & u_{1,n} \\\ u_{2,1} & u_{2,2} & \cdots & u_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ u_{m,1} & u_{m,2} & \cdots & u_{m,n} \end{pmatrix} \begin{pmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \end{pmatrix} $$
Here the columns of $U^{T}$ represent attributes and the rows represent hidden factors. Likewise, the vector $\vec{b}$ represents a single object with attributes in the rows. As a result of this multiplication, we have an object described in terms of $m$ factors.
Finally, to predict a class for our new object $\vec{b}$, we multiply $\vec{w}$ by the previous result:
$$w^{T}U^{T}B = \begin{pmatrix} w_{1} & w_{2} & ... & w_{m} \end{pmatrix} \begin{pmatrix} u_{1,1} & u_{1,2} & \cdots & u_{1,n} \\\ u_{2,1} & u_{2,2} & \cdots & u_{2,n} \\\ \vdots & \vdots & \ddots & \vdots \\\ u_{m,1} & u_{m,2} & \cdots & u_{m,n} \end{pmatrix} \begin{pmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \end{pmatrix} $$
Since $\vec{w}$ is the vector that works as a projection to produce a decision threshold, I think it should live in the same space that other points from $V^{*}$. Therefore, its columns should represent factors. The result of $w^{T}U^{T}B$ produces a number. The comparison of this number with the threshold allows to make a decision on the predicted class.
All of this seems rather reasonable. These operations work well and, of course, produce valid operations. However, I want to have a better grasp of this procedure.