I am working on a dataset with 20 independent variables and 41188 instances. The task is a binary classification where the target variable has 36548 number of no's and 4640 of yes's. I have used logistic regression model with 10 folds of cross validation. Since the target variable is unbalanced, I decided to resample data. I made the model 3 times: first without resampling data, then resampling data once with under-sampling technique and once with SMOTE technique. Following are the reports gained:
- With Under-sampling technique
Usually with unbalanced data, accuracy alone is not sufficient to evaluate the performance of the classifier and thus precision, recall and ROC values should be taken into accounts as well. The first model made without any resampling techniques, delivers weighted average higher accuracy, precision and recall values than the the others while its ROC value is slightly less than the model with SMOTE resampling technique. Moreover, in the first model the recall value of class yes (0.423) is much lower than the recall value of class no (0.973).
My question is which model is more trustable? and why the accuracy, precision and recall values were decreased after resampling the data?