I have a cross sectional data-set with around 1000 features and 5000 observations. There are many features (no categorical features) which are highly correlated (higher than 0.85). I want to decrease my feature set before modelling. I know that LASSO can be used to shrink feature set since it can set coefficients to zero depending on the penalization weight. However, under the presence of highly correlated features it can select irrelevant one.
On the other hand, as far as I know, if I use RF (with H2O), the effect of correlated features are diluted. In sklearn this is not an issua as explained here However, RF results are unstable since I have quite noisy data (i.e. every run without changing anything results in different feature set).
Considering that LASSO gives stable results for the same data-set, first I am planning to use it to shrink the feature set (from 1000 to 100) and then apply RF for the variable importance.
Does this approach make sense? If not, what would you suggest? Lastly, I don't want to apply PCA since I need interpretation of variable importance.