7

I'm developing a forecasting model for an infectious disease for a hospital and wanting to understand if I should use disease counts or rates (based on population or total clinic visits). What are the implications of using one approach over another (besides practical considerations, such as having an accurate population estimate)?

Any suggestions on literature would be greatly appreciated!

Thanks, Kate

Kate
  • 71
  • 1

1 Answers1

7

I've been pondering this question myself a lot recently, especially in the context of hospital epidemiology. My thoughts below:

The Case for Counts: The real core of the strength for counts is that they're a more meaningful number on a practical scale. Tell a nurse they're going to see 0.20 cases/patient-day more infections and they'll give you an odd look. Tell them they're going to see 17 more cases this month? That's a meaningful number for them to work off of.

This ties into a general undercurrent in Epidemiology at the moment focusing on an interest in absolute numbers, because our currently reliance on relative effect measures does some funny things, including exaggerating small effects at times. For example, if your rate doubles is that a crisis? Probably depends on if the counts are 1 and 2, 100 and 200 or 100,000 and 200,000.

The Case for Rates: That being said, there's a reason people like rates. Counts only really make sense if your denominator is fixed, or relatively so. For example, if your clinic sees 1,000 patients each month, then a change in cases from Month X to Month X+1 is a genuine change in cases. But if you saw 1,200 the second month? Are your increase in cases an increase, or just having more opportunity to see cases? Or in the case of hospital acquired infections, if you had more patients staying for more time in a given month, you had more opportunities to get them sick, and should probably adjust for that.

So the answer I think, sadly, is it depends. If you're confident that your denominator is stable, and each time point represents the same "opportunity" to see cases, you can probably get away with cases. But if it moves around, and one time point has more patients, staying for more time, etc. then you should probably use rates, unless you don't think you can get a good denominator.

And on a practical note, clinicians are trained, in my experience, to expect rates. If you want to publish or distribute your count-based findings, expect some push-back.

Fomite
  • 21,264
  • 10
  • 78
  • 137
  • 1
    +1 for the nice description. But, when the denominators of the rates are about the same, then modeling the counts and modeling the rates are effectively equivalent, right? When they are not the same, as you mentioned, it seems hard to justify modeling the counts unless, for example, you use an offset for the population size in a regression model (in which case you're effectively back to modeling the rates). Doesn't this seem to indicate that modeling the rates is always better? – Macro May 02 '12 at 01:58
  • @Macro I'd assert that for the situation you've given - wherein the denominator/exposure is stable and modeling counts and modeling rates are effectively equivalent, then the modeling of *counts* is preferable due to the superior ease of interpretation. – Fomite May 02 '12 at 02:00
  • Yes, but the statistical modeling results (re: the inference, effect estimates, etc.) would be exactly the same, it would just be a matter of scaling up the results by whatever the denominator was. So, I guess I should modify that to say "Doesn't this seem to indicate that modeling the rates is never worse?". – Macro May 02 '12 at 02:03
  • @Macro In a theoretical sense, yes. In a practical sense - having just done this - collecting the necessary data to *construct* denominators (even ones known to be stable) at the same level of granularity as counts is often a great deal of work. Beyond that, even if the modeling component is essentially equivalent, you'll likely have to make an argument for why you're *presenting* counts rather than rates. Hence the focus on interpretability. – Fomite May 02 '12 at 02:08
  • I assumed the original poster was describing a situation where you had a choice between 'rate' and 'count', indicating you knew the population size - in that case it seems you can't go wrong with 'rate'. In the context where you don't know the population size, clearly I agree you have to model the counts. Hopefully, in that case, you can rationally argue that the population size is constant, so you're just modeling a scalar multiple of the 'rate' and all of statistical inference is unaffected by the missing information. – Macro May 02 '12 at 04:01