For an NLP classification task I need to train two different classifiers and I've chosen to use a RandomForest and KNeighbors both the scikit-learn implementations.
My dataset is strongly imbalanced. See below the counts of documents for different subjects that are targets of classification task.
I have created a stratified train and test samples and with RandomForest I can use a "balanced_subsample" setting for "weights" parameter to ensure that the major classes are penalised and minor classes are boosted.
With RandomForest after tuning hyperparameters I'm able to achieve classification with F1-score of 0.59, accuracy: 0.58 and 0.82 ROC AUC score.
KNeighbors classifier does equally well but I feel like it should perform worse and it's getting the current accuracy by simply predicting the major classes correctly.
So my questions are: with the KNN classifier is there a need to add weights for an imbalanced classification? And if so, then how should I add weights to the classifier?
EDIT: I've see the abstract here: enter link description here and in the answer here: enter link description here that KNN classifier normally doesn't have any issues with imbalanced data but want to confirm this understanding is correct.