How to improve the sensitivity of minority class on imbalanced datasets

Question

I am working on a classifier which stratifies a population of samples into different classes.

The class distribution (ground truth) is imbalanced, and the prevalence of each class is:

$$\begin{matrix}Label&Prevalence\\C_1&0.14\\C_2&0.17\\C_3&0.26\\C_4&0.43\end{matrix}$$

The classifier is based on Random Forest.

At the moment my pipeline is the following:

Feature selection on the dataset - in this case I am testing:
- minimum Redundancy Maximum Relevance (mRMR)
On the feature-selected dataset, exhaustive search of Random Forest parameters (number of trees and minimum number of samples required to have a split) using GridSearchCV, in particular:
- a 3-Fold CV classification for each set of parameters, where each class has weight based on its prevalence
- each 3-fold CV classification is evaluated using a macro-averaged F1-score (in this way I would like to give the same importance to all classes, independently of prevalence)
- evaluation of distribution of scores using boxplots to define the optimal parameters

However, with this pipeline I am able to achieve improvements on the overall accuracy and on metrics for bigger classes. Instead, the minority class gains specificity but not enough sensitivity.

Is there a way to approach the problem in order to increase the sensitivity of the minority class?

Just a guess: from my understanding, macro-averaged-F1 scores might not be the best choice for your goal. You could try to replace its internal precision/recall with e.g. sensitivity/specificity or AUC of the ROC curve if possible (but can't guarantee that this will help). — geekoverdose, Jul 08 '16 at 10:01
This is the perfect answer for your question: http://stats.stackexchange.com/a/158030/78313 — Metariat, Jul 08 '16 at 10:30

How to improve the sensitivity of minority class on imbalanced datasets

0 Answers0