Is there a JackStraw equivalent for clustering?

Question

The JackStraw is a method to get honest p-values for correlations between variables and principal components derived from those variables.

JackStraw paper

There is a close relationship between PCA and clustering: K clusters gives you a signal in the top K-1 principal components, as outlined in this paper on PCA for detecting population substructure in genetic data. There is a more thorough and less sloppy discussion in this CV thread.

Does anyone know of a JackStraw equivalent that gives honest p-values for correlations between variables and cluster indicators? Preferably for an arbitrary clustering method and an arbitrary two-sample test?

score 2 · Accepted Answer · edited Apr 02 '18 at 15:11

2

This new preprint, "Statistical Significance of Cluster Membership with Applications to High-Throughput Genomic Data" answers your question. It tests association between variables and their computed cluster centers: https://www.biorxiv.org/content/early/2018/02/23/248633

The developmental version of the jackstraw R package includes these new functionalities. There's a dedicated and refined function for k-means clustering, as well as one that should work with a range of arbitrary clustering methods: https://github.com/ncchung/jackstraw

edited Apr 02 '18 at 15:11

eric_kernfeld

4,828
1
16
41

answered Apr 01 '18 at 13:13

user3385084

36
2

We are trying to build a permanent repository of high-quality statistical information in the form of questions & answers. Thus, we're wary of link-only answers, due to linkrot. Can you post a full citation & a summary of the information at the link, in case it goes dead? – T.E.G. Apr 01 '18 at 13:46

Is there a JackStraw equivalent for clustering?

1 Answers1