Questions tagged [catboost]

CatBoost is an open-source gradient boosting on decision trees library with categorical features support out of the box for Python & R

Catboost is an extreme boosting library which is well known for its categorical feature support. It has a new boosting scheme which helps to reduce overfitting.

Helpful Links:

25 questions
4
votes
1 answer

Please correct my assumption on how regression trees work

I'm trying to understand how regression trees work, I've been experimenting with catboost and xgboost in python, and I'm getting results which I don't expect, can someone please clarify (and apologies in advance if this is a coding error) I've…
David Waterworth
  • 516
  • 3
  • 11
4
votes
1 answer

IncToDec Catboost Explained

I am struggling to understand how the overfitting detector with catboost works: https://tech.yandex.com/catboost/doc/dg/concepts/overfitting-detector-docpage/#overfitting-detector I am finding catboost to work well relative to other options but I…
B_Miner
  • 7,560
  • 20
  • 81
  • 144
3
votes
1 answer

Any reasons to prefer neural networks over boosting methods in tabular data?

Based on Kaggle winners data, it seems that ensemble boosting methods like XGBOOST, LIGHTGBM, CATBOOST are the top choices when dealing with structured or tabular data for maximizing the prediction accuracy. However, in industry as far as I know,…
3
votes
1 answer

Negative Feature Importance Value in CatBoost LossFunctionChange

I am using CatBoost for ranking task. I am using QueryRMSE as my loss function. I notice for some features, the feature importance values are negative and I don't know how to interpret them. It says in the documentation, the i-th feature importance…
Kemeng Zhang
  • 87
  • 1
  • 7
3
votes
1 answer

Feature Interaction Strength in Catboost

I was wondering if anyone knew how the feature interaction strength is calculated in the catboost package. The documentation…
3
votes
1 answer

(Low cardinality) categorical features handling in gradient boosting libraries

In some popular gradient boosting libraries (lgb, catboost), they all seems like can handle categorical inputs by just specifying the column names of the categorical features, and pass it into a fit or model instance by setting it to…
Sam
  • 377
  • 2
  • 12
2
votes
1 answer

How do Ordered Target Statistics work for CatBoost?

This question follows closely this paper . I'm trying to fully understand how Ordered Target Statistics (TS) (for CatBoost) works. E.g. the CatBoost algorithm uses this method to group categorical features through estimatation of numerical values…
mugdi
  • 176
  • 1
  • 7
2
votes
0 answers

Looking for information regarding tree-based gradient boosting algorithms comparative performance on data sets with different underlying properties

I have a difficult time finding any theoretical or empirical comparative research regarding the tree-based gradient boosting algorithms on data sets with different underlying properties. Is there any reason to believe that one of them is better or…
Polarni1
  • 65
  • 4
2
votes
1 answer

Why does LightGBM Classifier gives some folks a probability of 1 of belonging in a class with log-loss?

I'm trying to use the LightGBM package in python for a multi-class classification problem and I'm baffled by its results. For a minority of the population, LightGBM predicts a probability of 1 (absolute certainty) that the individual belongs to a…
Guillaume F.
  • 696
  • 3
  • 11
2
votes
1 answer

catboost does not overfit - how is that possible?

I'm fitting and evaluating a CatBoostRegressor and a XGBRegressor to the same regression problem. I tried matching their hyperparameters as closely as possible, yet I'm seeing something strange: catboost test error is monotonically decreasing! Why…
ihadanny
  • 2,596
  • 3
  • 19
  • 31
2
votes
0 answers

Gradient boosting (GB) splitting methods (categorical features)

Regarding categorical features - ordinary trees treat categorical features in two main ways, CART - considers only binary splitting, those computing the mean response value (y_mean_i per each category i), sort them by this value and considers only…
afek
  • 21
  • 1
2
votes
1 answer

L2 Regularization in CatBoost

I am studying the CatBoost paper https://arxiv.org/pdf/1706.09516.pdf (particularly Function BuildTree in page 16), and noticed that it did not mention regularization. In particular, split selection is based on minimizing the loss of a new candidate…
Poland Spring
  • 31
  • 1
  • 3
1
vote
0 answers

Analytical expression of a CatBoost regression model in R

When adjusting a multiparametric regression model, an analytical expression that characterizes the fitted model (e.g., in a linear multiparametric regression, the equation is $\hat\beta= \hat\beta_o+\sum^{N}_{i=1} \hat\alpha_{i}\cdot x_i$) can be…
David E.S.
  • 97
  • 6
1
vote
1 answer

If evaluation set is the same as training set, why would the evaluation error be different from training error?

I understand the use of evaluation set for parameter tuning and over-fitting in general. The examples in the evaluation set should be unseen and different from training set. However, in the following toy CatBoost regression problem, in which I…
yowtzu.lim
  • 11
  • 2
1
vote
0 answers

Role of weighted quantile sketch in XGBoost, LightGBM and CatBoost

Have i understood it correctly if i say that "weighted quantile sketch" allows XGBoost to use histogram search on feature values for split finding? Also, do LightGBM or CatBoost use something similar to weighted quantile sketch? If not, how do they…
Polarni1
  • 65
  • 4
1
2