0

I'm testing the clara algorithm with a dataset, but as we can see in the figure: image

I got the message "These two components explain 1.06% of the point variability"

What can I conclude about this? The image splits with a good accuraccy

      a   b
  1 150   0
  2   0  50

But that message makes me confuse if I have some problem.

library(cluster)
cl <- clara(a[,-which(colnames(a) == "STATUS")],3)
table(cl$clustering,a$STATUS)
clusplot(cl,color=TRUE, shade=T,   lines=0)
plot(cl)

Also the graphic from the silhouette is empty. With an average of 0 in the silhouette.

JMR
  • 109
  • 1

1 Answers1

1

Although the two clusters are well separated, that separation is all on component 1. On component 2, there's lots of variation within each group and very little across the two groups.

Questions about code are off topic here, but you could try clustering on fewer variables (it's hard to tell just what you did).

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • I have a dataset with 300 samples and I created a new one with only 150. However I increase the variability to 49% but the cluster can't distinguish the data – JMR May 22 '17 at 17:40
  • 1
    This comment doesn't really make sense to me. – Peter Flom May 23 '17 at 11:17
  • can you explain? I have the variability too low and I don't know the real impact of this variable. Can you give any help? – JMR May 23 '17 at 11:56
  • I have no idea what you are talking about at this point. Clustering is not about the impact of variables. Variability of what? Percent of what? The two clusters vary on component 1, but not component 2, as I said in my answer. Your comment on my answer makes no sense. – Peter Flom May 23 '17 at 21:08