Determine the best k when performing knn

Asked Jan 28 '17 at 12:17

Active Jan 28 '17 at 12:17

Viewed 82 times

I am using the knn means algorithm to distinguish groups. In case on the iris dataset, lets take this example:

from sklearn import datasets 
iris = datasets.load_iris()
X = iris.data

I understand that I can create different clusters with k. What I am looking for however is a way that can help me to determine what is the best k, so whether 2,3,4 etc... creates the best and most homegenous groups?

Any thoughts on what is considered best practise in this case?

asked Jan 28 '17 at 12:17

Frits Verstraten

Training and testing, like K-fold cross validation. – Richard Hardy Jan 28 '17 at 12:19
@RichardHardy, thanks. Can you elaborate a little on this? And maybe some suggestions for tutorials etc...? – Frits Verstraten Jan 28 '17 at 12:38

Determine the best k when performing knn

0 Answers0