Why does lasso return unstable features when using the same data?

Question

I am using scikit-learn to shrink my data set having around 800 features. It is a very noisy data (market and economic data) To my best knowledge, lasso returns same features for the same data set. However, I don't observe this through my runs. Here is my function:

def select_lasso_feat(self, train_data, features, target):
        if len(features) <= 60:
            print('LASSO feature selection step skipped. Too few features on your dataset!')
            return features
        print('Performing LASSO feature selection...')
        X_train = self._standardize(train_data[features])
        y = train_data[target]

        alpha = 0.0003
        feat_len = 0
        while feat_len < 60:
            estimator = Lasso(alpha=alpha, random_state=23)
            feature_selection = SelectFromModel(estimator, threshold=0.1)
            feature_selection.fit(X_train, y)

            selected_features = feature_selection.transform(X_train)

            selected_features = list(pd.DataFrame(X_train).columns[feature_selection.get_support()])
            feat_len = len(selected_features)
            alpha -= 0.00003
        return list(set(selected_features))

As seen I keep fitting lasso until I reach a desired number of features (60 in this case). I do my trials with jupyter. And whenever I shut down my server and rerun my code with exactly the same data, I end up different feature lists returned by lasso. So what might be the reason?

My best guest is that you have rounding errors or numeric instability inside sklearn... I have had similar problems for datasets with large polynomial features — Xavier Bourret Sicotte, Jun 24 '19 at 17:02
Your model may suffer from multicollinearity. An example is here: https://stats.stackexchange.com/a/426984/ It is not clear to me whether this is also the case with your example. Maybe you could share something about the data aside from only the function that is used for analysis — Sextus Empiricus, Jan 15 '21 at 13:25

Why does lasso return unstable features when using the same data?

0 Answers0