I am working on a clustering algorithm and would like to validate its performance against a well-known and used dataset: the KDD-CUP 99 dataset (http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html). With this dataset, both unlabeled and labeled test data is provided. My question is, how should I validate my clustering algorithm's performance?
Let's say the results of my algorithm are as follows:
x1 -> cluster A
x2 -> cluster A
x3 -> cluster B
x4 -> cluster A
And let's say the labels provided are as follows:
x1 -> cluster 1
x2 -> cluster 1
x3 -> cluster 1
x4 -> cluster 2
Given that the cluster labels are completely different, how should I compare these? In this case, an obvious assumption would be to say that cluster A is probably the same as cluster 1, but this may not always be this obvious. Is there any standardized way to evaluate such situations?