I'm trying model stacking in a kaggle competition. However, what the competition is trying to do is irrelevant.
I think my approach of doing model stacking is not correct.
I have 4 different models:
xgboost model with dense features (numbers, that can be ordered).
adaboost model with sparse features (non-numeric features, which are label encoded, then one hot encoded).
xghoost model with dense features (sentiment analysis using nltk's vader on text).
These models generate their probabilities of a multi-class problem, and feeds into a final neural network model that combine their results, and then generate another set of probabilities of a multi class problem.
However, the more models I tried to munge in creates a worse model. For example, If I only use the first model, I would get 73% accuracy, but with each model added, it would drop to less than 70% accuracy, with the score on kaggle increasing from 0.6X to 1.0 above.
Is this approach incorrect?