I am working on a classifier which stratifies a population of samples into different classes.
The class distribution (ground truth) is imbalanced, and the prevalence of each class is:
$$\begin{matrix}Label&Prevalence\\C_1&0.14\\C_2&0.17\\C_3&0.26\\C_4&0.43\end{matrix}$$
The classifier is based on Random Forest.
At the moment my pipeline is the following:
- Feature selection on the dataset - in this case I am testing:
- On the feature-selected dataset, exhaustive search of Random Forest parameters (number of trees and minimum number of samples required to have a split) using GridSearchCV, in particular:
- a 3-Fold CV classification for each set of parameters, where each class has weight based on its prevalence
- each 3-fold CV classification is evaluated using a macro-averaged F1-score (in this way I would like to give the same importance to all classes, independently of prevalence)
- evaluation of distribution of scores using boxplots to define the optimal parameters
However, with this pipeline I am able to achieve improvements on the overall accuracy and on metrics for bigger classes. Instead, the minority class gains specificity but not enough sensitivity.
Is there a way to approach the problem in order to increase the sensitivity of the minority class?