Should I use the results of a previous model in my second model?

Question

So I've been trying to predict a minority class, and thus far I've built an svm/boosted tree/random forest/logistic regression/knn combo. After making them all and tuning them and doing feature engineering and all that:

I now have a nice single combination of them all, that weights them based on how simulated annealing thought they should be weighted and similarly where simulated annealing thought the vote cutoff should be (they were originally 0 or 1 for their votes and the cutoff was 5, but with the weights it's now funky; this was done on the test set).

Anyways, I am now doing an auto-encoder neural network as a final step in order to bring it home because my results are not where I'd like them to be (my company's revenue is fairly well tied to how well this model performs).

I was wondering if it would be a good idea or ill advised theoretically to include the recommendation from my previous mega combo as a new variable for the neural net? Would it help or just get in the way? Should I include the 0 or 1 votes from all the models or just the ultimate outcome?

score 1 · Accepted Answer · answered Aug 02 '18 at 20:08

No, there is no theoretical reason not to do this. Your mega combo may perform well in certain cases and not so well in others, and if you have enough samples, your neural net may be able to tell one from the other, trusting the combo in the first case and doing its own thing in the other. All fine.

The potential problems revolve around interpretability, but it already appears like this is low on your list of priorities. And keeping a big house of cards running in a productive environment. There seem to be many moving parts. If your monster is not robust, then a software upgrade on any one modeling tool could break things, and this might be hard to troubleshoot.

my results are not where I'd like them to be

You might enjoy looking at How to know that your machine learning problem is hopeless?

Good answer. Just to drive the point home, there is a very real business cost in developing, testing, and maintaining many models. Be sure you’re not training a revenue gain in one place for a deficit in another. — kbrose, Aug 02 '18 at 23:19
I'm the only analytics person in this company, interpretability is indeed low; if it does the job, the boss and boss's boss and boss's boss's boss don't care, so I have fairly free rein to try things and can explain by saying "it's 6 models voting" and they're satisfied. I feel the minimum for analytics should be two people now... — CapnShanty, Aug 03 '18 at 14:16

Should I use the results of a previous model in my second model?

1 Answers1