I have noticed that when applying PCA to large datasets, people often will first subset the data considerably. Sometimes people just randomly take a subset of the features/variables, but often they have a reason, largely related to removing variables they consider to be likely to be noise. A prototypical example is in the data analysis the Drop-Seq single cell sequencing of retina cells, the authors subset their expression data matrix from 25,000 genes to the the 384 most highly variable genes and then proceed to use various unsupervised dimensionality reduction techniques like PCA and t-SNE.
I have seen this sort of pre-processing in several other places as well. However, I don't understand why this sort of subsetting (feature pre-selection) is necessary. PCA will reduce the dimensionality such that the variance will be maximized--hence, the genes that are not varying will be largely ignored. Why so dramatically subset the data when the non-varying genes should not really have much of an effect on the result of PCA?
This is not a specific question about this paper, it seems to be something of a standard approach to large datasets, so I assume that there is something I am missing.