Suppose one has about 500 points of 50 dimensional data that one knows a priori is derived from a parametric model (perhaps with some outliers). Does using this knowledge help in feature selection? I am interested in both the continuous and discrete response case, if they differ.
Another related general question I have is whether using linear models and criterion like AIC are generally useful for feature selection, or whether they depend on strong model assumptions. I have heard that it can be wise to use "non-parametric" methods of feature selection. Are there any guidelines here?