3

Recently whilst doing an assignment using the PIMA Diabetes set I ran Logistic Regression using, amongst others:

  • the age predictor as is
  • segmented the age into ranges and applied OHE (with and without scaling).

There was a slight increase in some trained and validates models when using OHE.

My question is: why would OHE be better than the (scaled) age predictor? I cannot find a suitable explanation.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
thebluephantom
  • 245
  • 2
  • 9
  • 1
    It is probably an artifact of the data set and the variables you used. Fine-grained continuous variables will always be better than OHE dummies -- particularly for updating the weights. – John Stud Jan 25 '21 at 16:25
  • OK, so why is it that many examples and courses attempt to do this then? I agree with you. @JohnStud – thebluephantom Jan 25 '21 at 16:28
  • 1
    Not sure. It makes little theoretical sense, on those grounds alone, to "throw out" continuous data in exchange for a dummy. Modeling decisions should always be justified, as we can predict most data sets that we have by just generating enough random Xs. – John Stud Jan 25 '21 at 16:32
  • @JohnStud but you are not giving an answer – thebluephantom Jan 25 '21 at 16:36
  • 1
    That's right, I am giving you a comment. – John Stud Jan 25 '21 at 16:37

1 Answers1

2

If the relationship between the predictor and the target is not linear in log-odds, then binning and one-hot-encoding may perform better because the model gets to learn unrelated weights for each bin. This will be particularly noticeable if the true relationship is not monotonic. There will be a tradeoff in the size/number of bins as well: too small and the bins will make it easy to overfit, but too few and you lose too much of the continuous information.

Probably a better approach is to fit a spline, which accounts for the nonlinearity and does not throw away the information of the continuous predictor. There are several answers/comments here on the downsides of binning, e.g. What is the benefit of breaking up a continuous predictor variable? ; see related/linked questions and the tag for more.

Ben Reiniger
  • 2,521
  • 1
  • 8
  • 15