I have training data with each feature being different sources of probability. All of the features are probabilities (between 0 and 1 obviously). This is a binary classification problem.
Note that I can use just one of the features if I choose, just by selecting a cutoff. When I use just one of the features, it sometimes performs better than when I use all of the features (all of the probabilities). So, I decided to add a feature. I compute the performance of each of the features alone. Then, I create a new feature which is comprised of weighted sum of the probabilities of each row, weighting each feature by the performance of that feature only.
Example of original data:
col A col B col C
0.3 0.2 0.13
0.4 0.1 0.5
...with col A alone having an AUC score of 0.6
, col B 0.5
, col C 0.55
. Note that the sum of these is 0.6+0.5+0.55=1.65
So, I add this feature:
col A col B col C new_feature
0.3 0.2 0.13 (0.3*0.6+0.2*0.5+0.13*0.55)/1.65
0.4 0.1 0.5 (0.4*0.6+0.1*0.5+0.5*0.55)/1.65
I add this feature, and the performance degrades instead of improves. I tried Logistic Regression, Random Forest, and other, similar classifiers. I cannot think of a good reason why. How would this decrease the performance of the model with all of the features (including the one that I created)?