Why would (ada) boosting increase classification error for multinomial logistic regression model?

Question

A basic multinomial logistic regression model doesn't do great (~20%) in terms of test classification error for my problem, so a thought I had right away was to apply the adaboost to lower that. Initially, I got negative weights on the subsequent classifiers, but I figured out this was because the original adaboost algorithm is meant for binary classification. This paper: https://web.stanford.edu/~hastie/Papers/samme.pdf extends adboost to a multi-class response using a multi-class exponential loss function. The new algorithm ends up being very similar to the original: all we do is add a constant log(K-1) to the weight for the current classifier (K=#of classes).

Anyway, after implementing this the weights all end up positive but the test and training classification error actually increases as the number of classifiers, M, increases which seems bizarre. My guess is that multinomial logistic regression doesn't count as a "weak learner" (if true, why?). Is this something that happens with non-weak learners or could it just be a bug in my code? Or something else?

I moved on to trying something else to lower classification error, so this is more of an open ended curiosity question.

Maybe, posting your implementation of this adaboost extension to multi-class response (and maybe a sample of your dataset) might help people answer your problem — Pop, Dec 07 '16 at 09:40

EngrStudent · Answer 1 · 2019-08-08T15:51:56.177

I think that you missed the point of boosting.

Process:

start with a LEAST SQUARED ERROR fit
determine those items with higher error in the set
add weights to them to increase how aggressively the model engages
re-fit using the updated weights

When you are done, if you compute weighted Least Squared Error then your fit is "Better". If you compute unweighted error, then it will be worse.

If you have a super-learner, something that qualifies as a universal function approximator (NN, GBM, …), then it could eventually (in an ideal sense) memorize your training data. You could get perfect representation. Boosting helps it make the most of itself.

If you have a normal-learner, such as a nominal logistic regression, then boosting isn't going to make it do anything that it isn't already doing. It will not give it super-powers.

Suggestion:

You might consider a classification task where you care about both true positives and true negatives. Think about it in terms of a confusion matrix (link). Would you be willing to give up some of your accuracy in classifying the things you are good at in order to be less bad at handling the things you are bad at?

Consider these:

Why would (ada) boosting increase classification error for multinomial logistic regression model?

1 Answers1