With help by the discussions here I successfully trained various models for classification.
As an example say I trained a stochastic gradient boosted model (gbm) and an extreme gradient boosted tree (xgboost). They are trained using cross-validation on a training set and then tested on a test set measuring AUC (I get values aroung 0.87
Now I would like to combine those models to get an even better one.
I tried to average the predicted probabilities and yes, AUC slightly improved on the test set.
But if I stack the models in the following sense:
- calculate the predicted probabilities $p_{\text{gbm}}$ and $p_{\text{xgb}}$ on the training set and use these as predictors.
- train some model (linear, tree) in the sense $\text{class} \sim p_{\text{gbm}}+p_{\text{xgb}}$
Models of this kind have AUC of 0.9 on the training set and 0.8 on the test set (less than the individual models).
Isn't using something more sophisticated than average or linear weighting just overfitting the training set? The information about the data does not get more. It is just hidden in the stage-one predictions.
I would appreciate any comment!