2

Here is the data set I'm working with:

data

I'm trying to find the best possible multiple regression for R as dependent and the rest as independent variables.

Here's what I did in R:

> trainX <- as.matrix(spxdata[4:11])
> trainY <- spxdata[[3]]
> CV = cv.glmnet(x = trainX, y = trainY, alpha = 1, nlambda = 100)
Warning message:
Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
> plot(CV)
> fit = glmnet(x = trainX, y = trainY, alpha = 1, lambda = CV$lambda.1se)
> fit$beta[,1]
    RE VOL260 VOL360     PE     PX   FCFY   GADY    NDE 
     0      0      0      0      0      0      0      0 

And here's the CV plot:

enter image description here

Why is there a warning message and why are all the fitted coefficients zero?

1 Answers1

1
  1. Warning message is because you appear to have fewer than 30 observations. cv.glmnet defaults to 10 folds, which amounts to fewer than 3 observations per fold. The warning message doesn't appear consequential to your concern.
  2. The simplest explanation for why all fitted coefficients are zero is because the data does not support a more complex model (i.e. cross validation error is minimized at large shrinkage).

If you believe that some coefficients shouldn't be zero in the fitted model, you might consider:

  • A ridge regression, which is less likely to zero out coefficients (may still arbitrarily shrink them, however)

  • A Bayesian approach, where you set informative priors for coefficients you believe to be non-zero

khol
  • 777
  • 7
  • 13
  • I tried to use lambda.min instead of lambda.1se, and now there are a couple coefficients unequal 0. Is there a downside to using the lambda.min instead? If I want to use the ridge regression, I just have to set the alpha to 0, or are there more adjustments that need to be made? – Lumberjack88 Apr 30 '18 at 23:46
  • @Lumberjack88 https://stats.stackexchange.com/a/70268/187294 is a good answer to read up on. In short, lambda.min corresponds to the model minimizing cross validation error, while lambda.1se always corresponds to a simpler model (i.e. more coefficient shrinkage) with reasonably similar error to the lambda.min model. This means that the lambda.min model will be more accurate on the train set but has a larger chance of being overfit compared to the lambda.1se model. – khol Apr 30 '18 at 23:53
  • alpha = 0 gives a ridge regression while leaving alpha unspecified leads to an elastic net, in glmnet – khol Apr 30 '18 at 23:56