4

I'm trying to find any relationship/patterns between a large number of rows in a dataset (~2000) and I'm thinking of using a correlation heatmap. However, after transforming the df using df = df.T.corr() and only plotting the first 100 rows with seaborn, it already starts to look unreadable:enter image description here

Is there a clearer way to do this with a larger number of rows?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    Sorting the correlation matrix may provide clusters of variables, see [here](http://stats.stackexchange.com/q/26920/1036) for one description of how to sort them. – Andy W Jul 11 '16 at 12:17
  • 2
    Any Python based solutions? – user3508494 Jul 11 '16 at 13:18
  • I found `sns.clustermap(df.T.corr(), metric='correlation', method='centroid')` which might do the trick. – tmrlvi Nov 22 '17 at 15:26
  • Try to do some basic clustering before (with the kernel trick if necessary), then order your dataset with respect to the classes. In python, use scikit-learn's k-means, PCA or whatever clustering technique works with your data. – Romain Reboulleau Oct 04 '18 at 11:06

0 Answers0