I am using scikit-learn to shrink my data set having around 800 features. It is a very noisy data (market and economic data) To my best knowledge, lasso returns same features for the same data set. However, I don't observe this through my runs. Here is my function:
def select_lasso_feat(self, train_data, features, target):
if len(features) <= 60:
print('LASSO feature selection step skipped. Too few features on your dataset!')
return features
print('Performing LASSO feature selection...')
X_train = self._standardize(train_data[features])
y = train_data[target]
alpha = 0.0003
feat_len = 0
while feat_len < 60:
estimator = Lasso(alpha=alpha, random_state=23)
feature_selection = SelectFromModel(estimator, threshold=0.1)
feature_selection.fit(X_train, y)
selected_features = feature_selection.transform(X_train)
selected_features = list(pd.DataFrame(X_train).columns[feature_selection.get_support()])
feat_len = len(selected_features)
alpha -= 0.00003
return list(set(selected_features))
As seen I keep fitting lasso until I reach a desired number of features (60 in this case). I do my trials with jupyter. And whenever I shut down my server and rerun my code with exactly the same data, I end up different feature lists returned by lasso. So what might be the reason?