Using p Values after regularization to finalize on variables to be used

Question

I would like to understand if we should remove variables with less p-Values after regression with regularization.

If p-value is high doesn't it mean that our beta is just by chance and it might actually be zero?

If so, then this beta should ideally be removed. Is there any case where I should be careful about this conclusion? When can p-Values after regularization be misleading and why?

I followed the answer here: https://stats.stackexchange.com/a/171462/172758

It says:

With even just 2 collinear predictors, their individual regression coefficients are likely to vary widely among bootstrap samples so that their individual p-values may appear insignificant. Nevertheless, their joint contributions to the regression might be much more stable and thus their combination very significant in practical terms

If at all two collinear variables' combination is more significant together wouldn't this be true even of p-Values of pure least squares regression? Or only regularization p-Values be misleading?

Just like its mentioned here https://stats.stackexchange.com/a/171462/172758. Create large number of samples and compute their betas. Use this distribution to find the p-Value. — Narahari B M, Feb 14 '18 at 04:01

score 2 · Answer 1 · answered Feb 14 '18 at 06:52

In general, significant vs. non-significant does not translate into "definitely matters without any doubt" vs. "definitely does not matter". Unless there is some particular convention (and these are usually bad conventions for most practical purposes), it is usually not a good idea to base decision on p-values. Usually, it makes more sense to do something that specifically aims to do what you want to achieve (e.g. to get a good predictive performance of a model, look at cross-validation).

For regression models with potentially correlated predictors, as alluded to by the posts you are referencing, there is a risk that you might not get signficance of important predictors, because the model "cannot quite decide" (not a technical term, but it brings the issue across), which of several predictors is the important one. It might be that any one of them would more or less tell you the same thing (and having at least one of them is hugely important) and that your model might tell you that they are all not significant. Or you might not have enough data to get significance, which seems like a questionable reason for selecting a particular model, again.

As stated what you should also depend on what your ultimate aim is. However, selection your final model based on whether something is significant or not, tends to invalidate any standard inference (e.g. estimates, p-values, confidence intervals etc. will no longer have their usual properties) and it requires some cumbersome adjustments to take into account the model selection in final inference.

If this type of inference is not your goal and you are more interested in prediction, then as alluded to above, p-values should not be your guide at all. Additionally, the regularization you already did (unclear what you did), presumably accounted for the list of candidate terms for your model. If you re-estimate/re-regularized the model with some terms ommitted, you will apply an inappropriately low penalty and have an overfit model despite regularization.

Great, So if accuracy is our concern pValues may not really break the deal because sometimes multicollinearity exists. So as long as MSE of test dataset is optimal with given predictors after cv we can keep them. Am I wrong? However when trying to interpret the model co-efficients (get contributions of individual predictors from ridge regression) what if some variables still show less significance(Maybe due to multicollinearity maybe not)? Doesnt ridge handle beta values so that multicollinearity is addressed well? — Narahari B M, Feb 14 '18 at 11:26
When re-regularized the model with some terms ommitted, am I not underfitting ( because of less predictors)? — Narahari B M, Feb 14 '18 at 11:26

Using p Values after regularization to finalize on variables to be used

1 Answers1