restricting xgboost predictions value within a range

Question

I am trying to predict the users rating on movies. These ratings are continuous ranging from 1 to 5. I have been using xgboost with objective function reg:squarederror indicating regression with squared loss.

As you can see most of the ratings are concentrated at 4 and there are many predictions more than 5! I wonder what are the possibilities to inform xgboost regarding this limitation. What kind of cost function can I make in this scenario ? Alternatively, I can bin my values into 10 bins and do a multi-class prediction but still I wonder if there are more correct statistically solutions !

You could try ordinal regression. Search this site, or start with https://stats.stackexchange.com/questions/281619/linear-regression-or-ordinal-logistic-regression-to-predict-wine-rating-from-0 — kjetil b halvorsen, May 12 '20 at 16:01
@jketil thanks for the comment - very relevant - but in this case, rates are continuous values not categorical - ordinal regression goes under categorical. so still I think this is a different case :) — Areza, May 12 '20 at 17:28
You can use ordinal regression also with continuous response, Its covered in chap 15 of Frank Harrell's *Regression Modeling strategies*, and his function `orm` in R package `rms` fits such models. — kjetil b halvorsen, May 12 '20 at 19:11

restricting xgboost predictions value within a range

0 Answers0