1

I understand that LASSO and AIC are striking for a balance between model fit and size. However, how do they respectively measure the size/complexity of the model?

Does AIC measure the number of parameters and LASSO measures the coefficients?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
rabito
  • 49
  • 4
  • See @FrankHarrell's answer here https://stats.stackexchange.com/questions/48200/lasso-vs-aic-for-feature-selection-with-the-cox-model?rq=1 – user23658 May 06 '21 at 19:41

2 Answers2

3

Comparing AIC and LASSO is comparing apples and oranges.

LASSO is an estimator of the regression coefficients. It is the minimizer of a penalized/constrained least squares criterion. Which LASSO estimator to use (defined by the tuning parameter scaling the penalty or constraint set) is most commonly decided by cross-validation.

AIC, on the other hand, is an estimator of prediction error for a given estimator. Given a set of candidate estimators, one could compare these by comparing their respective AIC values. As you suggest, AIC does consider model complexity as it is partly a function of an estimator's degrees of freedom.

In principle, you could compare the set of LASSO estimators using a version of AIC rather than cross-validation (but this is potentially problematic, and is not advised in general). See Is it possible to calculate AIC and BIC for lasso regression models?.

user23658
  • 326
  • 1
  • 8
1

So let's make an assumption of normality (so that MLE == OLS) and take a look at the AIC equation from wiki: AIC = 2k + n ln(RSS) here k is the number of parameters (variables), n is the sample size, and RSS is the residual sum of squares. So for a given k and n we minimize the AIC by simply fitting for standard ols coefficeints. In other words, it has nothing to do with our fitting procedure/coefficients. It is simply a cost function to compare across what we can change: the k. So the AIC is used to compare the same model with differing features to essentially find the model with the most bang for your buck. If k increases but your RSS does not decrease enough then your AIC will increase and you have a 'worse' model. It really is about the balance of k and RSS.

Lasso on the other hand is a cost function that we are optimizing for, so it does have to do with our coefficients. Specifically we choose coefficients which minimize: the sum of squared errors PLUS the sum of absolute value of our coefficients times a parameter lambda we select:enter image description here

Basically, (assuming normality) the AIC will be minimized for a given k at OLS so the only real change we can make is changing k with different subsets. For Lasso we make changes by changing our coefficients and naturally we can select a coefficient of 0 to drop it.

Tylerr
  • 1,225
  • 5
  • 16