I performed text clustering using k-mean method.
mydat=read.csv("C:/kr_csv.csv", sep=";",dec=",")
tw.corpus <- Corpus(VectorSource(mydat$name))
tw.corpus <- tm_map(tw.corpus, removePunctuation)
tw.corpus <- tm_map(tw.corpus, removeNumbers)
tw.corpus = tm_map(tw.corpus, content_transformer(tolower))
tw.corpus = tm_map(tw.corpus, stemDocument)
doc.m <- DocumentTermMatrix(tw.corpus)
dtm_tfxidf<-weightTfIdf(doc.m)
m<-as.matrix(dtm_tfxidf)
rownames(m)<-1:nrow(m)
norm_eucl=function(m)
m/apply(m,1,function(x)sum(x^2)^.5)
m_norm=norm_eucl(m)
> dim(m_norm)
[1] 399 860
res=kmeans(m_norm,3,30)
clusters<-1:3
for (i in clusters){
cat("cluster", i,":", mydat[res$cluster==i,],"\n\n");
mydat[res$cluster==i]
}
I manually set 3 clusters. Is it possible to automatically select the correct number of required clusters?