0

I performed text clustering using k-mean method.

mydat=read.csv("C:/kr_csv.csv", sep=";",dec=",")

  tw.corpus <- Corpus(VectorSource(mydat$name))
  tw.corpus <- tm_map(tw.corpus, removePunctuation)
  tw.corpus <- tm_map(tw.corpus, removeNumbers)
  tw.corpus = tm_map(tw.corpus, content_transformer(tolower))
  tw.corpus = tm_map(tw.corpus, stemDocument)


 doc.m <- DocumentTermMatrix(tw.corpus)

  dtm_tfxidf<-weightTfIdf(doc.m)

m<-as.matrix(dtm_tfxidf)
rownames(m)<-1:nrow(m)

norm_eucl=function(m)
  m/apply(m,1,function(x)sum(x^2)^.5)
m_norm=norm_eucl(m)
> dim(m_norm)
[1] 399 860
res=kmeans(m_norm,3,30)
clusters<-1:3
for (i in clusters){
  cat("cluster", i,":", mydat[res$cluster==i,],"\n\n");
  mydat[res$cluster==i]
}

I manually set 3 clusters. Is it possible to automatically select the correct number of required clusters?

D.Joe
  • 157
  • 1
  • 7
  • If you are doing text mining, you almost certainly should not be using k-means (in fact, a good rule of thumb is: if you are doing *clustering*, you almost certainly should not be using k-means; just because it's the only clustering method most people have ever heard of doesn't mean it's typically a good choice). At any rate, this looks like a request for code, which is off topic here. To the extent there's a substantive question here, it appears to be covered by the duplicate. – gung - Reinstate Monica Mar 14 '18 at 16:22
  • @gung, i just gave an simple example(i could use hclust).but anyway, I'm working in the R, and you thought that this post is a duplicate, and gave a link in which, I did not find the answer. If you think, that i not to be attentive, then tell me in which place of the topic you are talking about, shows how to solve the problem in R – D.Joe Mar 14 '18 at 19:46
  • If you are asking 'how do I do this in R', that is off topic here. If you have a *substantive* question about how the number of clusters are determined, that is addressed in the duplicate. – gung - Reinstate Monica Mar 14 '18 at 20:05

0 Answers0