I (think I) understand the process of PCA and the advantages it offers in pre-processing data for classification models / lower-order visualisation. I also understand you can look at each Principal Component and see the loadings of each feature.
Lets say I have a live data set in which I record 100 or so features (aka columns) for the samples in my data, lets say each feature takes about the same time/effort/cost to measure.
I do a PCA and if find that that 99% of the variance is explained by the first 50 Principal Components. Great, now I can trim my data before running classification models. This saves me time and effort.
Now it may be the case that there are features that have loadings near 0 for all of the first 50 Principal Components - so are near useless - and are a waste of time for me to measure.
Is there a practical way of detecting these 'useless' features in the PCA? Are there any cut-offs that are usually used? Is this using a sledge-hammer to crack a nut? Should I just use univariate analysis to find useless features?
I understand it is similar to this question Using principal component analysis (PCA) for feature selection. But the answers there do not give any pragmatic rules of thumb or any methods for defining a non-informative feature. Nor does it compare using such methods to other methods of removing non-informative features.
This Stack-overflow explains some methods of comparing feature importance using the Iris dataset but does not show how one would choose a feature to drop. https://stackoverflow.com/a/50845697/3562522
This example of PCA uses graphical methods to look at the top 40 features in each principal component to gain insights, but again does not attempt to find useless features.