1

Assuming we wanted not to predict someone's income but rather simply if that person makes more or less than a given amount (say $50k). (When) would it be better to approach this as a classification problem ("0=less than 50k, 1=more than 50k") or as regression problem? That is, would it make more sense to bin the income and then perform a classification, or perform a regression and bin the result?

There has been a question about whether every classification problem can be approached as a regression problem (here). I'd be interested if the answer is the same for this special case.

oW_
  • 229
  • 3
  • 11
  • 2
    Setup validation and try both. I'd suspect the regression problem will do better. In general throwing away information via thresholding isn't good unless it somehow removes irrelevant noise. – Ryan Bressler Nov 29 '16 at 22:43

0 Answers0