5

I am a newbie to Xgboost and I would like to use it for regression, in particular, car prices prediction. I started following a tutorial on XGboost which uses XGBClassifier and objective= 'binary:logistic' for classification and even though I am predicting prices, there is an option for objective = 'reg:linear' in XGBClassifier.

1) Should XGBClassifier and XGBRegressor always be used for classification and regression respectively?

2) Why does objective ='reg:linear' option even exist for XGBClassifier? Shouldn't it be only available in XGBRegressor?

3) Is "explained variance" the best metric for regression model evaluation? or perhaps RMSE?

Alan Abishev
  • 53
  • 1
  • 4

1 Answers1

5

1) Should XGBClassifier and XGBRegressor always be used for classification and regression respectively?

Basically yes, but some would argue that logistic regression is in fact a regression problem, not classification, where we predict probabilities. You can call predicting probabilities "soft classification", but this is about a naming convention.

2) Why does objective ='reg:linear' option even exist for XGBClassifier? Shouldn't it be only available in XGBRegressor?

Logistic regression uses logistic loss function, but no one prohibits you from minimizing squared loss, i.e. squared difference between predicted probabilities and the target zeros and ones. As far as I understand from the documentation, this is what XGBoost will do if you use the 'reg:linear' parameter in here. See also the What is the difference between linear regression and logistic regression? thread.

3) Is "explained variance" the best metric for regression model evaluation? or perhaps RMSE?

There is no such a thing as "the best metric", if there was, we would be using it for all the problems and didn't have multiple metrics. Metric is problem specific.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks Tim, so I can use XGBClassifier for regression problems but it's better to use a dedicated XGBRegressor, right? Is there a big performance difference? – Alan Abishev Feb 01 '18 at 08:54
  • @Baraban no, you can't. You can use squared loss for classification, you cannot use classifier for regression. – Tim Feb 01 '18 at 08:56