2

We have two algorithms (simple rule-based) working on labeling the dataset as "Yes" and "No" for a disease. There is no ML involved in this task.

For ex: If Algo 1 says subject 1 has the disease (Yes) and Algo 2 also says subject 1 has the disease (Yes), we consider that the subject has a disease.

What if one Algo 1 says that subject has "No disease" and Algo 2 says that subject "has disease" (Yes).

Here, it is a tie. We don't have any 3rd rule-based algo/human to decide on whether the subject should be treated as "Yes" or "No".

Can experts here suggest me some ways on how can I make a decision here on how to consider a subject as "yes" or "No"? Is there any mathematical/statistical/scientific method available to make this decision?

The Great
  • 1,380
  • 6
  • 18

1 Answers1

1

First off, modify your algorithms so they don't output hard 0-1 classifications, but probabilistic classifications. Whether someone has a 99% chance of having a disease is a very different situation than if he has only a 51% chance. And even if he has a 20% chance, you probably want to do something, like run additional tests.

I recommend this earlier thread.

You can then assess your two probabilistic classifiers using proper . You may find that one is already so bad it should not be considered further.

If both classifiers work reasonably well, you can combine their probabilistic classifications, e.g., by simply taking the average of their two predicted probabilities. This will again give you a probabilistic classifier, and you can again assess its classifications using proper scoring rules, and/or use (cost-driven!) cutoff thresholds to make decisions on further action.

Combining models and classifications usually improves performance. You may be tempted to find "optimal" weights. This may or may not work (Claeskens et al., 2016), so if you do so, make sure you compare the "optimally combined" classifier against a simple equally-weighted one.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Unfortunately, my rules are SQL based. and there is no ML involved here at this point. We would like to create label based on these rules. Once, we have labels we will apply ML models for our further applications. Can guide me on how can we do it when we don't have ML model but rule based algorithms – The Great Feb 05 '21 at 11:38
  • btw, thanks and upvoted – The Great Feb 05 '21 at 11:39
  • Hm. In that case, you will need to follow algorithm 1 or 2 in the case of ties. The question is whose mistakes are costlier in this particular situation. You can assess this on a holdout set and then decide who to follow in the case of ties. How precisely you will do this will depend on your particular application and the cost of misclassification. – Stephan Kolassa Feb 05 '21 at 12:15