How does AIC vs. LASSO work?

Question

I understand that LASSO and AIC are striking for a balance between model fit and size. However, how do they respectively measure the size/complexity of the model?

Does AIC measure the number of parameters and LASSO measures the coefficients?

See @FrankHarrell's answer here https://stats.stackexchange.com/questions/48200/lasso-vs-aic-for-feature-selection-with-the-cox-model?rq=1 — user23658, May 06 '21 at 19:41

user23658 · Accepted Answer · 2021-05-10T21:59:49.803

Comparing AIC and LASSO is comparing apples and oranges.

LASSO is an estimator of the regression coefficients. It is the minimizer of a penalized/constrained least squares criterion. Which LASSO estimator to use (defined by the tuning parameter scaling the penalty or constraint set) is most commonly decided by cross-validation.

AIC, on the other hand, is an estimator of prediction error for a given estimator. Given a set of candidate estimators, one could compare these by comparing their respective AIC values. As you suggest, AIC does consider model complexity as it is partly a function of an estimator's degrees of freedom.

In principle, you could compare the set of LASSO estimators using a version of AIC rather than cross-validation (but this is potentially problematic, and is not advised in general). See Is it possible to calculate AIC and BIC for lasso regression models?.

score 1 · Answer 2 · answered May 06 '21 at 19:56

So let's make an assumption of normality (so that MLE == OLS) and take a look at the AIC equation from wiki: AIC = 2k + n ln(RSS) here k is the number of parameters (variables), n is the sample size, and RSS is the residual sum of squares. So for a given k and n we minimize the AIC by simply fitting for standard ols coefficeints. In other words, it has nothing to do with our fitting procedure/coefficients. It is simply a cost function to compare across what we can change: the k. So the AIC is used to compare the same model with differing features to essentially find the model with the most bang for your buck. If k increases but your RSS does not decrease enough then your AIC will increase and you have a 'worse' model. It really is about the balance of k and RSS.

Lasso on the other hand is a cost function that we are optimizing for, so it does have to do with our coefficients. Specifically we choose coefficients which minimize: the sum of squared errors PLUS the sum of absolute value of our coefficients times a parameter lambda we select:

Basically, (assuming normality) the AIC will be minimized for a given k at OLS so the only real change we can make is changing k with different subsets. For Lasso we make changes by changing our coefficients and naturally we can select a coefficient of 0 to drop it.

How does AIC vs. LASSO work?

2 Answers2