XGBoost: Why early stopping gives worse results?

Question

I have tried xgboost on MNIST dataset with default settings and using early stopping.

Why I get worser results with early stopping in terms of accuracy? (93.4% vs 92.8%)

Here is samples of code:

With early stopping:

def train():
    X, y= mnist_dataset.load_train_data()
    X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.20, random_state=2017)

    print ('X_train.shape', X_train.shape)
    print ('y_train.shape', y_train.shape)
    print ('X_test.shape', X_test.shape)
    print ('y_test.shape', y_test.shape)

    xgb_train = xgb.DMatrix(X_train, label=y_train)
    xgb_test = xgb.DMatrix(X_test, label=y_test)

    clf = xgb.XGBClassifier(objective = 'multi:softmax',
                            silent = True,
                            nthread = multiprocessing.cpu_count()
                            )
    clf.fit(X_train, y_train,
        early_stopping_rounds=10, eval_metric="merror", eval_set=[(X_test, y_test)],
        verbose=True)

    y_pred = clf.predict(X_test)

    print('acc:',100.0*accuracy_score(y_test, y_pred))

With default settings:

def train():
    X, y= mnist_dataset.load_train_data()
    X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.20, random_state=2017)

    print ('X_train.shape', X_train.shape)
    print ('y_train.shape', y_train.shape)
    print ('X_test.shape', X_test.shape)
    print ('y_test.shape', y_test.shape)

    xgb_train = xgb.DMatrix(X_train, label=y_train)
    xgb_test = xgb.DMatrix(X_test, label=y_test)

    # setup parameters for xgboost
    param = {}
    param['objective'] = 'multi:softmax'
    param['silent'] = 1
    param['nthread'] = multiprocessing.cpu_count()
    param['num_class'] = 10

    clf = xgb.train(param, xgb_train)

    y_pred = clf.predict(xgb_test)

    print('acc:',100.0*accuracy_score(y_test, y_pred))

how many rows/images are there in that data-set? Have you tried increasing the early-stopping rounds? Especially with small data sets early stopping usually leads to worse results. Check out [this](http://scikit-learn.org/dev/auto_examples/ensemble/plot_gradient_boosting_early_stopping.html) link which compares using early stopping vs not using it with GBTs. Early stopping at least makes sure your not running too many rounds and hopefully doesn't overfit. You are probably underfitting. — user3494047, Feb 01 '18 at 16:26

score 1 · Answer 1 · answered Jul 03 '18 at 01:25

Your code is measuring out-of-sample cross-entropy loss to determine when to stop. Cross-entropy and accuracy don't measure the same thing, so it's not surprising that you get different results.

XGBoost: Why early stopping gives worse results?

1 Answers1