I'm tempted not to remove any feature before running the data through a feature selection algorithm. But is it needed depending on the algorithm? For example, a filter method such as mRMR should automatically deal with correlation via mutual information or other measure of "redundancy" between features, which is minimized as features are added to the set of selected features. Or am I missing something? And what about embedded methods, such as boosting and random forests? I realize these can be used for prediction, but I'm mainly interested in "feature importance" here.
UPDATE
To clarify, my features have physical meaning, and I don't want to obtain a reduced space and lose the physical interpretation of features. I'm primarily interested in identifying the features that are most informative with respective to a target variable. The concern is, if there are highly correlated features (and I could use Spearman or Pearson correlation coefficients to detect it, for example), should they be removed prior to running a feature selection algorithm, or not? I think that removing features that I judge highly correlated may affect the results. But maybe, depending on the algorithm, there's no need to do that beforehand.