Questions tagged [k-nearest-neighbour]

A non-parametric method of classification and regression. The input consists of the $k$ closest training examples in the feature space. The output is either the mode of the neighbors (in classification) or their mean (in regression).

The k-nearest neighbors algorithm (k-NN) is a non-parametric method proposed by Thomas Cover used for classification and regression. In both cases, the input consists of the $k$ closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

  • In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its $k$ nearest neighbors ($k$ is a positive integer, typically small). If $k=1$, then the object is simply assigned to the class of that single nearest neighbor.
  • In k-NN regression, the output is the property value for the object. This value is the average of the values of $k$ nearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance for classification, normalizing the training data can improve its accuracy dramatically.

Source: Wikipedia.

541 questions
120
votes
5 answers

What are the main differences between K-means and K-nearest neighbours?

I know that k-means is unsupervised and is used for clustering etc and that k-NN is supervised. But I wanted to know concrete differences between the two?
nsc010
  • 1,467
  • 2
  • 10
  • 9
34
votes
2 answers

How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?

I want to generate the plot described in the book ElemStatLearn "The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition" by Trevor Hastie & Robert Tibshirani& Jerome Friedman. The plot is: I am wondering how I…
littleEinstein
  • 523
  • 1
  • 5
  • 7
32
votes
5 answers

When should I apply feature scaling for my data

I had a discussion with a colleague and we started to wonder, when should one apply feature normalization / scaling to the data? Let's say we have a set of features with some of the features having a very broad range of values and some features…
30
votes
5 answers

Why would anyone use KNN for regression?

From what I understand, we can only build a regression function that lies within the interval of the training data. For example (only one of the panels is necessary): How would I predict into the future using a KNN regressor? Again, it appears to…
user46925
29
votes
1 answer

k-NN computational complexity

What is the time complexity of the k-NN algorithm with naive search approach (no k-d tree or similars)? I am interested in its time complexity considering also the hyperparameter k. I have found contradictory answers: O(nd + kn), where n is the…
Daniel López
  • 5,164
  • 2
  • 21
  • 42
24
votes
4 answers

Why do you need to scale data in KNN

Could someone please explain to me why you need to normalize data when using K nearest neighbors. I've tried to look this up, but I still can't seem to understand it. I found the following…
bugsyb
  • 491
  • 1
  • 5
  • 13
22
votes
2 answers

When is "Nearest Neighbor" meaningful, today?

In 1999, Beyer et al. asked, When is "Nearest Neighbor" meaningful? Are there better ways of analyzing and visualizing the effect of distance flatness on NN search since 1999? Does [a given] data set provide meaningful answers to the 1-NN problem? …
denis
  • 3,187
  • 20
  • 34
22
votes
1 answer

Does Dimensionality curse effect some models more than others?

The places I have been reading about dimensionality curse explain it in conjunction to kNN primarily, and linear models in general. I regularly see top rankers in Kaggle using thousands of features on dataset that hardly has 100k data points. They…
21
votes
4 answers

Is KNN a discriminative learning algorithm?

It seems that KNN is a discriminative learning algorithm but I can't seem to find any online sources confirming this. Is KNN a discriminative learning algorithm?
user46925
20
votes
4 answers

Combining machine learning models

I'm kind of new to datamining/machine learning/etc. and have been reading about a couple ways to combine multiple models and runs of the same model to improve predictions. My impression from reading a couple papers (which are often interesting and…
screechOwl
  • 1,677
  • 3
  • 21
  • 32
19
votes
4 answers

Dealing with ties, weights and voting in kNN

I am programming a kNN algorithm and would like to know the following: Tie-breaks: What happens if there is no clear winner in the majority voting? E.g. all k nearest neighbors are from different classes, or for k=4 there are 2 neighbors from class…
Fletcher Duran
  • 191
  • 1
  • 1
  • 3
17
votes
3 answers

Choosing optimal K for KNN

I performed a 5-fold CV to select the optimal K for KNN. And it seems like the bigger K gets, the smaller the error... Sorry I didn't have a legend, but the different colors represent different trials. There are 5 total and it seems like there's…
Adrian
  • 1,665
  • 3
  • 22
  • 42
16
votes
5 answers

KNN imputation R packages

I am looking for a KNN imputation package. I have been looking at imputation package (http://cran.r-project.org/web/packages/imputation/imputation.pdf) but for some reason the KNN impute function (even when following the example from the…
Wouter
  • 2,102
  • 3
  • 17
  • 26
16
votes
3 answers

Explanation of formula for median closest point to origin of N samples from unit ball

In Elements of Statistical Learning, a problem is introduced to highlight issues with k-nn in high dimensional spaces. There are $N$ data points that are uniformly distributed in a $p$-dimensional unit ball. The median distance from the origin to…
user64773
  • 163
  • 1
  • 5
15
votes
3 answers

Training error in KNN classifier when K=1

I got this question in a quiz, it asked what will be the training error for a KNN classifier when K=1. What does training mean for a KNN classifier? My understanding about the KNN classifier was that it considers the entire data-set and assigns any…
1
2 3
36 37