My LASSO test MSE is greater than my OLS test MSE. Why is that?

Question

My Description:

I'm learning about the LASSO model and selecting the best Lambda for it using the cv.glmnet() function. I had to divide the data in "College" in the ISLR library in R into training and testing data and then predict the number of applications a college would receive.

Part of my assignment was to compare the MSE for the LASSO model using the best lambda against the MSE for the Ordinary Least Squares Estimate (OLS or LSE) with lambda equal to zero. Sometimes, my program would actually give the OLS a lower test MSE than did the best lambda model, and I wanted to ask why that occurs. Please try other values for set.seed() to see this result. I will attach my code beneath just in case it's helpful. Thank you very much for your knowledge and advice.

library(ISLR)

data("College")

attach(College)

dim(College)

sum(is.na(College)) ## No missing data

# a.    Split the data set randomly into training and test data set.

set.seed(5) 

train <- sample(1:nrow(College), nrow(College)/2)

test <- (-train)


college_train = College[train,]

college_test = College[test,]

###################################################

# b.    Fit Lasso model using glmnet() function on the training data set.

install.packages("glmnet")

library(glmnet)

y_train <- college_train$Apps

x_train <- subset(college_train, select = -Apps)

class(x_train)

x_train <- data.matrix(x_train)

class(x_train)



y_test <- college_test$Apps

x_test <- subset(college_test, select = -Apps)

x_test <- data.matrix(x_test)

grid <- 10^(seq(10, -2, length = 100))

lasso_mod1 <- glmnet(x_train, y_train, alpha = 1, lambda = grid, thresh = 1e-12)

###################################################

# c.    Perform cross-validation on the training data set to choose the best lambda

cv.lasso <- cv.glmnet(x_train, y_train, alpha =1)

best_lambda <- cv.lasso$lambda.min

###################################################

# d.    Estimate the predicted values using the best lambda obtained in part (c) on the test data using the predict() function) and compute test MSE. (10 points)

lasso.pred <- predict(lasso_mod1, s= best_lambda, newx = x_test)

test_MSE <- mean((lasso.pred - y_test)^2)

test_MSE

###################################################

# e.    Compare the Lasso predicted test MSE with the null model (lambda=infinity) test MSE and least square regression model (lambda=0) test MSE.  

lasso.pred_null <- predict(lasso_mod1, s= 1e12, newx = x_test)

test_MSE_null <- mean((lasso.pred_null- y_test)^2)

test_MSE_null

isTRUE(test_MSE<test_MSE_null) ### The Best Lambda had a smaller test MSE than did the null model


lasso.pred_LSR <- predict(lasso_mod1, s= 0, newx = x_test)

test_MSE_LSR <- mean((lasso.pred_LSR- y_test)^2)

test_MSE_LSR

isTRUE(test_MSE < test_MSE_LSR)

It's really hard to follow your question, can you organise it better (by for example separting code from text)? — gunes, Jul 26 '20 at 20:00
@gunes I just edited it. The top is the description of my problem; the bottom is the code. The bolded parts near the letters in the bottom part of my post explain what I had to do in each section. I'm new to StackExchange, so thank you for your patience. — user11953813, Jul 26 '20 at 20:11

score 1 · Answer 1 · answered Jul 26 '20 at 21:00

The bias-variance trade-off decomposes the expected test MSE into contributions from bias (squared), variance of the estimates, and irreducible error. Note how that is defined:

... the expected test MSE ... refers to the average test MSE that we would obtain if we repeatedly estimated [the unknown function] $f$ using a large number of training sets, and tested each at [a particular predictor value] $x_0$. The overall expected test MSE can be computed by averaging ... over all possible values of $x_0$ in the test set. (Emphasis added.)

So it shouldn't be too surprising that in any single test/train split your results don't agree with your preconceptions about how LASSO and OLS should perform. See what happens when you do this on many different initial train/test splits and average over the splits, in the spirit of the definition of the "expected test MSE."

These vagaries of splitting small data sets into separate training and test sets are expected. Thus many recommend not to do such single splits unless you have thousands of data points; use re-sampling with cross-validation or bootstrapping within small data sets.

Also, consider how well the unpenalized model works on the whole data set; if it's good enough LASSO might not provide any improvement in the first place.

My LASSO test MSE is greater than my OLS test MSE. Why is that?

1 Answers1