I have a loan defaulters dataset and it is highly imbalanced as shown below:
0 1
33108 673
I have tried SMOTE to balance the dataset, as shown below:
smoted_data <- SMOTE(state~., deliq, perc.over=200, perc.under = 800)
after applying SMOTE, when i trained by glm(), as given below:
model1 <- glm(state~. -LanID -Month -LastMonthBnc -DELINQ.NON.DELINQ, data = smoted_data, family = "binomial", maxit = 500)
it was able to capture class "1" to good degree(though with high error 23% accuracy in class 1 prediction):
0 1
0 31334 1774
1 140 533
However when i tested it on test data, it was extremely poor(class 1 prediction 4% accuracy):
pred_class_30
0 1
0 10154 149
1 210 7
This indicates that my model is over-fitted and i must go for generalization.
My question is 23% accuracy in training data is still not so good, so any other method that can help me to improve the accuracy of such imbalanced data set?
I have checked all the existing similar posts but could not find anything talking about how to improve the accuracy of minority class specially when model is overfitted....