2

I'm trying to do picture recognition. There are 3 types of methods from OpenCV library. Eigenfaces, Fisherfaces and Local Binary Pattern Histogram. These are good, but in practice, Fisherfaces is the best becase Eigenfaces is good if you have a noisy picture and Local Binary Pattern Histogram cannot handle noice, but it can handle linear illumination differences.

And Ficherfaces can handle both.

To do Fisherfaces we need to have a collection $X$ where $X$ contains vector $c$ classes of pictures.

This is our collection of pictures in form of vectors: $$X = X_1, X_2, X_3, ... , X_c$$

This is our picture in form of a vector: $$X_i = x_1, x_2, x_3, ... , x_n$$

Then we need to create scatter matrices.

Here $N_i$ is the index of the vector $x_n$ $$S_B = \sum_{i=1}^c N_i(\mu_i - \mu)(\mu_i - \mu)^T$$

$$S_W = \sum_{i=1}^c\sum_{x_j \in X_i}(x_j - \mu_i)(x_j - \mu_i)^T$$

We can find the means from: $$\mu = \frac{1}{N} \sum_{i=1}^N x_i$$ And this is just the mean of a single class $X_i$ $$\mu_i = \frac{1}{|N_i|}\sum_{x_j \in X_i} X_j$$

To solve this optimization problem. We want to maximize this matrix $W_{opt}$. This is the criteria.

$$W_{opt} = arg max_w \frac{|W^TS_BW|}{|W^TS_WW|}$$

We can solve this optimization problem by solving the generalized eigenvalue problem: $det(A - \lambda I) = 0$:

$$S_W^{-1}S_B W = \lambda W$$

If $S_W$ is singular, we can add a small constant $\epsilon$ so $S_W$ can be invertable. That is recommended. $$ (S_W +\epsilon I)^{-1}S_B W = \lambda_i W$$

Or we can skip the $\epsilon$ and find $S_W^{-1}$ by using House Holder QR-factorization or Singular Value Decomposition. It can handle singular matrices.

Copy from the lecture notes here and here and here. This is the best lecture note.

Questions:

Let's assume that I have my collection $X$ that contains only pictures of my cat, and I want to compare with another sample $Y$, which is another cat, just to make sure how close $X$ are to $Y$. $Y$ is just one sample and $X$ is just a collection.

Or is $X$ a collection of different classes like $X_1$ is all pictures of my cat and $X_2$ is all pictures on my dog and so on? Or is every class in $X$ just a average picture?

So how do I use this linear discriminant analysis algoritm to compare? What should I use $W_{opt}$ too? Is that a matrix?

Edit:

Here is my code to find $S_W$ and $S_B$

%% Our data
data=load('iris'); % IRIS database
X = data.Inputs';
N = size(X, 2); % Column length of the sets

%% Settings
S = 50; % Sets - Here you can edit
C = 3; % Classes - Here you can edit

%% Create ui and u
ui = zeros(C, N);
u = zeros(1, N);
for i = 1:C
  for j = 1:N
    for k = 1:S
      ui(i, j) = ui(i, j) + X(k+S*(i-1), j)/S; % Sum all sets of one class into one mean.
    end
    u(1, j) = u(1, j) + ui(i, j)/C; % Sum all the means from the sets into one big mean
  end
end


%% Create Sw and Sb
Sb = zeros(N,N);
Sw = zeros(N,N);
H = zeros(N,1);
L = zeros(N,S);
for i = 1:C
  for j = 1:N
    H(j) = ui(i, j) - u(1, j);
  end
  Sb = Sb + S*H*H'; % Muliply with amout of sets

  for k = 1:S
    for j = 1:N
      L(j, k) = X(k+S*(i-1), j) - ui(i, j);
    end
  end
  Sw = Sw + L*L';
end

%% Print Sb and Sw
Sb
Sw

%% Compute for hand
W1 = X(1:50, :);
W2 = X(51:100, :);
W3 = X(101:150, :);
u1 = mean(W1)';
u2 = mean(W2)';
u3 = mean(W3)';
u = mean([u1';u2';u3'])';

SB = 50*(u1 - u)*(u1-u)' + 50*(u2 - u)*(u2-u)' + 50*(u3 - u)*(u3-u)'
SW = (W1' - u1)*(W1' -u1)' + (W2' - u2)*(W2' - u2)' + (W3' - u3)*(W3' - u3)'


%% Compute 
[W, LAMBDA] = eig(inv(SW)*SB)
Wopt = (W*SB*W)/(W*SW*W)

1 Answers1

1

Your question is basically a classification question, not so much a question about LDA. The issue about "cats" is commonly a cat vs. dog (other animals, cars, trucks, airplanes) question, where one image from many objects of the same class are employed. But the question can focus on your cat vs. another cat, based on multiple images of each object.

Regarding LDA, however, assigning class predictions to objects is based on the Mahalanobis distance. Let $\boldsymbol{\Omega}$ represent the number of classes, and $\omega$ $(1,2,\ldots,\boldsymbol{\Omega})$ represent a single class. For a given object $\bf y$ represented by a $p \times 1$ vector of feature values (this is your $X_i$), the Mahalanobis distance from the object to the centroid of class $\omega$ is defined as \begin{equation} \underset{(1 \times 1)}{D_{\omega}({\bf y})} = \underset{(1 \times p)}{({\bf y} - \bar{\bf y}_{\omega})}' \underset{(p \times p)}{{\bf S}_{pl}^{-1}} \underset{(p \times 1)}{({\bf y} - \bar{\bf y}_{\omega})} , \end{equation} where $\bar{\bf y}_{\omega}$ is a $p \times 1$ vector of mean feature values for objects in class $\omega$, for which the individual elements are \begin{equation} \bar{y}_j= \frac{1}{n_{\omega} } \sum_{i=1}^{n_{\omega}} y_{ij} \quad \quad j=1,2, \ldots p \quad y_{ij} \in \omega, \end{equation}

and ${\bf S}_{pl}^{-1}$ is the inverse of the pooled covariance matrix.

The decision rule $D({\bf y}\rightsquigarrow \omega)$ is to assign object $\bf y$ to the class for which $D_{\omega}({\bf y})$ is the smallest.

Fisher's Discriminant Analysis (FDA)

Using your notation, the easiest way to classify a new test object (not in the training set) is to first calculate

\begin{equation} \mathbf{w} = \mathbf{S}_{W}^{-1} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2 ) , \end{equation}

and estimate

\begin{equation} w_0 = \frac{1}{2} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2 )^\top \mathbf{S}_{W}^{-1} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2 ) - \log \left(\frac{P_1}{P_2} \right), \end{equation}

where $P_1$ and $P_2$ are the proportion of training objects in each class.

Next, for a test object $\mathbf{x}_i$ with unknown class label, assign the object to class 1 if

\begin{equation} \mathbf{w}^\top \mathbf{x}_i + w_0 >0 \end{equation}

and assign to class 2 otherwise.

Obtaining $\mathbf{x}_i$ for each image

Note: I am using $\mathbf{x}_i$ for your $X_i$. For a single image $\mathbf{x}_i$, let $j$ represent each pixel $(j=1,2,\ldots,p)$. To begin, you need to "gray-scale" each $j$th pixel's R,G,B values (which range from 0-255) into a single gray color value, and call this $x_{ij}$. When done, there will be $p$ values of $x_{ij}$ represented by $\mathbf{x}_i$.

Cross-validation (training and testing)

If you have e.g. 100 images for each of 2 cats ($n=200$ total images) and want to call each cat a class (i,e, classes 1,2), here's what I would do for 10-fold cross-validation.

  1. Create a 2x2 confusion matrix $\mathbf{C}$, which has two rows and two columns. Rows represent the true class of each test image, and columns represent the predicted class.

  2. Randomly permute the order of the 200 images (for both cats), this is called shuffling.

  3. Using the randomly sorted images, assign the first 20 images to fold 1. Then assign images 21-40 to fold 2, do this for each set of 20 images, up to the 10th fold with images 181-200.

  4. Next, train the FDA algorithm by using the 180 images in folds 1-9, and leave out images 180-200 in fold 10 for testing. This will result in new values of $\boldsymbol{\mu}_1$, $\boldsymbol{\mu}_2$, $\mathbf{S}_{W}$, $\mathbf{w}$, and $w_0$. Note that $\boldsymbol{\mu}_1$ will contain $p$ averages ($p=\#pixels$) based on true class 1 images within the 180 training images. Same thing for $\boldsymbol{\mu}_2$ being derived from true class 2 images within the same 180 training images. The sample size for images in class 1 vs class 2 does not have to be the same -- in fact it won't be most of the time.

  5. Using $\mathbf{w}$ and $w_0$, if $\mathbf{w}^\top \mathbf{x}_{181} + w_0 >0$, assign to class 1, and 2 otherwise. Do this for images 182-200. Each image (181-200) will have a true class and predicted class, since you know which cat each image is from (class 1 = your cat, class 2=other cat). For each test image, add a 1 to element $c_{true,predicted}$ of the confusion matrix $\mathbf{C}$ based on each image's true class and predicted class.

  6. The sum of entries in matrix $\mathbf{C}$ will now be equal to 20, since there were 20 test images in fold 10.

  7. Repeat steps 4 and 5 using images in folds 1-8, and fold 10 for training, while using images in fold 9 for testing. This, will result in a sum of 40 in the confusion matrix elements.

  8. Repeat steps 4 and 5 using images in folds 1-7, and folds 9-10 for training, while using images in fold 8 for testing. By now, you should get the picture, i.e., you are using the 20 images in single folds 1-10 for testing with images in the 9 remaining folds for training.

  9. When done, the predictive accuracy of FDA based on 10-fold CV will be

\begin{equation} acc = \frac{c_{1,1} + c_{2,2}}{c_{1,1} + c_{1,2} + c_{2,1} + c_{2,2}} \end{equation}

Note: you will be calculating values of $\boldsymbol{\mu}_1$, $\boldsymbol{\mu}_2$, $\mathbf{S}_{W}$, $\mathbf{w}$, and $w_0$ ten times, since there are ten folds used during 10-fold CV. In other words, when the 20 images in each "test" fold are left out of training and are used for testing, the values for $\boldsymbol{\mu}_1$, $\boldsymbol{\mu}_2$, $\mathbf{S}_{W}$, $\mathbf{w}$, and $w_0$ will always be derived from the 180 images which are in the remaining 9 "training folds."

  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/105565/discussion-on-answer-by-nxg-logic-where-is-my-input-when-i-use-linear-discrimina). – gung - Reinstate Monica Mar 15 '20 at 00:51