1

Recently at work I enter an interesting discussion that I thought could continue here and receive your output.

I'm trying to model some data that have as an output a categorical variable (let's say X). I'm using R H2O XGBoost (but not fixed on it). I started by building one model that has as outcome the variable X.

However, a colleague of mine suggested building on model for each category on X. His strategy would be that, for each category x in X I should encode the response variable as a binary response, where the entries with x are positive and all others are negative. Hence I can train one model per class.

I will evaluate the performance of both strategies, but I was interested in understanding the theoretical and conceptual side of it. Is there any objectively, theoretically sound reason against or pro it?

Diogo Santos
  • 747
  • 9
  • 19

0 Answers0