1

I need to calculate the Standardized Infection Rate (SIR), that is the ratio between actual cases of infection and predicted cases based on a predictive model.

According to CDC https://www.cdc.gov/nhsn/pdfs/ps-analysis-resources/nhsn-sir-guide.pdf the predicted number of cases has to be computed by summing up the per-observation probability generated by the model. Instead, in many other studies and application of machine learning applications, predicted cases are those with a predictive probability >= .5.

Which are the theoretical and intuitive reasons about why I should use one or the other method? furthermore, the CDC method is good when estimating a SIR for a subgroup of the data or on unseen new data but summing up the predictive probabilities of all the training set will give you always the exact number of real cases and therefore always a SIR of exactly 1.

Thanks

Bakaburg
  • 2,293
  • 3
  • 21
  • 30
  • See [here](https://stats.stackexchange.com/questions/225843/why-p0-5-cutoff-is-not-optimal-for-logistic-regression) for why the presumed 0.5 cutoff is irrelevant. – AdamO Feb 09 '18 at 14:48
  • Standardized incidence (or infection) rate is a tool to detect epidemics. It is not about individual outcomes, but population level: for instance if there are 90 MRSA cases at hospital A when you expected 20, that's a 4.5 fold SIR: evidence the hospital is doing badly. You can use a prediction or ML model to estimate the predicted counts. These methods don't separately provide answers to the same question. – AdamO Feb 09 '18 at 14:58
  • Ok, I was wondering how should I calculate a SIR on my training set if the predicted probabilities will always sum up to the total number of cases and therefore the SIR will always be 1. – Bakaburg Feb 09 '18 at 15:38
  • Does it though? The predicted number of cases would be TP + FP. The observed number of cases would be TP + FN. If you're working with probabilities instead of predictions, then you're right, there's never a lack of calibration in within sample validation. You have to estimate the calibration with an independent validation sample. – AdamO Feb 09 '18 at 15:59

0 Answers0