2

I want to classify histograms/distributions using K-Nearest-Neighbor. I can measure distances/dissimilarities between the distributions (using euclidean distance, kullback-leibler divergence...), thus I can obtain distance matrices. I was wondering since Nearest Neighbors measure distances anyway, can I incorporate distance matrices directly into the algorithm?

Also if you know a function in R or python that already exists, I'm interested. thank you

More details on my dataset: I have more than 100 observations that I want to classify in 2 classes (I have the labels) and all the features (4 features) are histograms (1 feature = 1 histogram).


UPDATE:

Using R: function "knn_dist" from "evclust" package

learneRS
  • 457
  • 3
  • 16
  • Are you looking for something like K-means? – user2974951 Nov 30 '18 at 10:34
  • Not really, I know that you can use similarity matrix as an input in K-medoid algorithm but since I have class labels I want to use a supervised learning for the classification task (k-means and K-medoids are unsupervised learning but KNN is supervised learning) – learneRS Nov 30 '18 at 11:39
  • 1
    Read this thread about k-means https://stats.stackexchange.com/q/32925/3277. Same is true for other analyses. To embed a distance matrix into feature space by means of MDS with sufficient number of dimensions giving good fit (stress value). Then proceed with knn or k-means etc. as usual. Of course, your matrix cannot be huge, or MDS won't cope. – ttnphns Dec 01 '18 at 09:42
  • Thank you @ttnphns .So if I understand well, you can't just use a distance matrix as an input in knn algorithm , you have to do MDS first than use knn like usual ? and second question MDS works only if the distance we use is the euclidean distance? – learneRS Dec 02 '18 at 18:12
  • 1
    MDS is for any distance. I recommend you to read something about MDS before using it. – ttnphns Dec 03 '18 at 07:41
  • @ttnphns I did, I found that if it's a non-euclidean distance/dissimilarity matrix, I have 3 possibilities: (1)use only positive eigenvalues provided; (2) transform (by addition of constants) the matrix into a Euclidean matrix; (3) use non-metric MDS. Does it seems accurate to you because i'm not sure? – learneRS Dec 03 '18 at 15:37
  • Finally I managed to find an R function: function "knn_dist" from "evclust" package. hopefully it works well – learneRS Dec 03 '18 at 16:41

1 Answers1

1

yes, it's possible because KNN finds the nearest neighbor, you already have distance/similarity matrix then the next step is to fix k value and then find the nearest value. Out of all the nearest neighbor take the majority vote and then check which class label it belongs.