I want to use linear regression with very large design matrix for discovery of governing equations to i.e. physical systems. The design matrix would include potential terms that can be part of the equation. This procedure is called SINDy and usually selects a parsimonious set of active parameters through penalized regression (such as LASSO) or sequential thresholding. In both options, lasso and thresholding, there is a hyperparameter that describes how sparse the solution will be. Changing this parameter will lead to different possible solutions. To find the correct one of these possible equations, should I rely on data by using cross-validation or should I try to find it through information criteria? What are the benefits of each?
Another question that arises is, do I actually have a MLE, even if I use LS after the parameter have been chosen by LASSO. Why is this? The Design matrix will include many transformations of the same measured variables. This would be x1 x1^2 x1^3 ... cos(x1) ... everything that could possibly play a role. If I assume x1 is normally distributed and independent, the other parameters no longer are normally distributed due to the transformation, right?