I am trying to predict the users rating on movies. These ratings are continuous ranging from 1 to 5. I have been using xgboost with objective function reg:squarederror
indicating regression with squared loss.
As you can see most of the ratings are concentrated at 4 and there are many predictions more than 5! I wonder what are the possibilities to inform xgboost regarding this limitation. What kind of cost function can I make in this scenario ? Alternatively, I can bin my values into 10 bins and do a multi-class prediction but still I wonder if there are more correct statistically solutions !