I have this assignment question:
You are given a dataset for cancer detection having two classes (binary classification). 0 stands for “cancer not detected” and 1 for “cancer detected”. This dataset has train/test split. Training set has 10,000 instances/records where half of the instances belong to class 0 and remaining half belong to class 1. Testing set has 1,000 instances where 990 instances belong to class 0 and 10 instances belong to class 1. You create two models, model A and model B. Model A gives you training accuracy of 80% and testing accuracy of 75%. Model B gives you training accuracy of 50% but testing accuracy of 99%. Between these two, which model will you prefer and why? Discuss potential problems in both models and method how to rectify them.
I believe that model B is better. Since test accuracy matters more and generally it should be higher than training accuracy (to reduce the chances of overfitting). But the difference between test and training accuracy of model B is large which is baffling me. Please tell which one is better.