1

I have this assignment question:

You are given a dataset for cancer detection having two classes (binary classification). 0 stands for “cancer not detected” and 1 for “cancer detected”. This dataset has train/test split. Training set has 10,000 instances/records where half of the instances belong to class 0 and remaining half belong to class 1. Testing set has 1,000 instances where 990 instances belong to class 0 and 10 instances belong to class 1. You create two models, model A and model B. Model A gives you training accuracy of 80% and testing accuracy of 75%. Model B gives you training accuracy of 50% but testing accuracy of 99%. Between these two, which model will you prefer and why? Discuss potential problems in both models and method how to rectify them.

I believe that model B is better. Since test accuracy matters more and generally it should be higher than training accuracy (to reduce the chances of overfitting). But the difference between test and training accuracy of model B is large which is baffling me. Please tell which one is better.

Laksan Nathan
  • 1,692
  • 1
  • 9
  • 28
user177763
  • 145
  • 4

0 Answers0