2

Screenshot of the R console: This is how I am getting the coefficients of each feature for each class(uplands and wood), which is making it harder for which features to selectI am trying to use LASSO regression for selecting important features. I have 27 numeric features and one categorical class variable with 3 classes. I used the following code:

x <- as.matrix(data[, -1])
y <- data[,1]
fplasso <- glmnet(x, y, family = "multinomial")

#Perform cross-validation
cvfp <- cv.glmnet(x, y, family = "multinomial", type.measure = "class")

#Select features (with coefficients not shrunk to zero)
coef(cvfp, s = "lambda.min")

It is providing me coefficients of features for each of the 3 separate classes. Since I am using LASSO for the first time, I am just wondering if it is a correct way to do the feature selection? Also, should the coefficients of all features be reported per class or should there be a single coefficient of each feature overall? In other words, will the coefficients be different for each feature depending upon each class?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
kritika
  • 21
  • 2
  • 5
    Don't trust any variable selection method unless you can show that it is stable. Take 100 samples of size N from your data of size N and repeat the process. Look at consistency of set of selected variables. You'll almost always be disappointed. The data just don't contain enough information to make correct choices of which variables are important. – Frank Harrell Jul 22 '21 at 11:20
  • I have edited your code to try to make it clearer. You might need to provide a reproducible example for us to see what is going on. – Ben Jul 22 '21 at 13:12
  • Thanks @Ben, I just added the screenshot of the console displaying the results. I am getting different coefficients for each feature in each of the classes I have (namely, uplands, wood, and streambanks). So, it is making me harder to decide which features to select as not the same features shrunk to '0' in each class. – kritika Jul 22 '21 at 13:54
  • 2
    @kritika I think you're starting to "be disappointed" in feature selection, as Frank Harrell speculated would happen. – Dave Jul 22 '21 at 13:59
  • Yes, for example, feature 'S' in class 'uplands' shrunk to zero but did not in class 'wood' (as in the picture attached). So, should I consider it an important feature or not...? – kritika Jul 22 '21 at 14:40
  • Please say more about the purpose of your modeling. Feature importance can be a [slippery concept](https://stats.stackexchange.com/q/202277/28500). LASSO can be useful for predictive models, but it's not going to tell you which are the "most important features" in any fundamental sense. Also, your choice of "class" as the CV measure probably isn't the best. Edit your question to say more about the underlying scientific problem you are addressing, and you might get suggestions for better ways to proceed. Comments can be deleted, so editing the question is best for providing more information. – EdM Jul 22 '21 at 20:35
  • If you have overly correlated features, Lasso won't be a very stable variable selector. You should assess whether the features are moderately uncorrelated. –  Jul 22 '21 at 23:11

0 Answers0