I need to calculate the Standardized Infection Rate (SIR), that is the ratio between actual cases of infection and predicted cases based on a predictive model.
According to CDC https://www.cdc.gov/nhsn/pdfs/ps-analysis-resources/nhsn-sir-guide.pdf the predicted number of cases has to be computed by summing up the per-observation probability generated by the model. Instead, in many other studies and application of machine learning applications, predicted cases are those with a predictive probability >= .5.
Which are the theoretical and intuitive reasons about why I should use one or the other method? furthermore, the CDC method is good when estimating a SIR for a subgroup of the data or on unseen new data but summing up the predictive probabilities of all the training set will give you always the exact number of real cases and therefore always a SIR of exactly 1.
Thanks