2

The JackStraw is a method to get honest p-values for correlations between variables and principal components derived from those variables.

JackStraw paper

There is a close relationship between PCA and clustering: K clusters gives you a signal in the top K-1 principal components, as outlined in this paper on PCA for detecting population substructure in genetic data. There is a more thorough and less sloppy discussion in this CV thread.

Does anyone know of a JackStraw equivalent that gives honest p-values for correlations between variables and cluster indicators? Preferably for an arbitrary clustering method and an arbitrary two-sample test?

eric_kernfeld
  • 4,828
  • 1
  • 16
  • 41

1 Answers1

2

This new preprint, "Statistical Significance of Cluster Membership with Applications to High-Throughput Genomic Data" answers your question. It tests association between variables and their computed cluster centers: https://www.biorxiv.org/content/early/2018/02/23/248633

The developmental version of the jackstraw R package includes these new functionalities. There's a dedicated and refined function for k-means clustering, as well as one that should work with a range of arbitrary clustering methods: https://github.com/ncchung/jackstraw

eric_kernfeld
  • 4,828
  • 1
  • 16
  • 41
  • We are trying to build a permanent repository of high-quality statistical information in the form of questions & answers. Thus, we're wary of link-only answers, due to linkrot. Can you post a full citation & a summary of the information at the link, in case it goes dead? – T.E.G. Apr 01 '18 at 13:46