1

I am running a hierarchical clustering process in R, using daisyto compute a dissimilarity matrix and agnes for hierarchical clustering, as described in Clustering of mixed type data with R.

With my 8 GB Ram, I constantly run into this error:

Error: cannot allocate vector of size 1.8 Gb

I have 21836 rows with only 2 variables. However, I'd like to use more variables, but I am already running out of memory using only 2.

  • Are there any alternative algorithms for a mixed data set of continuous and categorical variables?

  • Are there any alternative tools (I am currently using R) which would require less memory?

rmuc8
  • 367
  • 1
  • 4
  • 13
  • You are clustering rows, so the size of the matrix is `21836^2`. Multiply by 8 and you'll get roughtly 4 Gb RAM needed _only to store_ the input matrix. But the procedure (and computer) surely needs more free memory to be able to perform. I'm not R user, though, so please wait for somebody knowing R well to advise you. – ttnphns May 12 '15 at 14:18
  • 1
    Besides, I'm sceptical about the idea itself - to cluster so many objects by hierarchical cluster analysis (see last point [here](http://stats.stackexchange.com/a/63549/3277)). – ttnphns May 12 '15 at 14:22
  • Thx for sharing your thoughts. Do you have sth in mind how to solve this issue? – rmuc8 May 12 '15 at 15:54
  • When I have to cluster very many objects by categorical (nominal) variables I use Two-step clustering of SPSS (I'm SPSS user). – ttnphns May 12 '15 at 16:08
  • 1
    What algorithm do you apply there? – rmuc8 May 12 '15 at 17:36
  • http://stats.stackexchange.com/q/81603/3277 – ttnphns May 12 '15 at 19:41
  • @ttnphns judging from the error message, the full distance matrix is 1.8 GB "only" (probably exploiting symmetry, 21835 * 21834 / 2 * 8. But R maybe makes too many copies and thus runs out of memory even with 8GB RAM. – Has QUIT--Anony-Mousse May 13 '15 at 22:03

0 Answers0