How to find statistical significance between two years of data

Question

I was asked to determine if the number of HAI infections in 2011 in statistically significant from the number of HAI infection in 2012. I performed a paired T-test using the actually numbers, but am now questioning myself as to whether I should be using the standardized incident rate instead. This is for a 6 hospital network so each hospital is reporting a number that I am adding up for a total yearly number.

Do you have just *one* number? Do you have, maybe, an infection count per month? I wonder if this might be modeled better as a time-series, or with a mixed-effects model or the GEE. — gung - Reinstate Monica, Jan 21 '13 at 19:49
I am provided with an actual infection number and a rate per 10,000 patient days for each hospital at the end of 2011 and the end of 2012. I also have these numbers for the entire network eg all 6 hospitals combined. I ultimately want to know if the infection rate for 2012 is significantly different from 2011. — Katie Yohnke, Jan 21 '13 at 21:16
So you have just one number per hospital per year, then? If the number of patient days varies across the hospitals in the network, I would use the rates instead of the raw infection numbers. — gung - Reinstate Monica, Jan 21 '13 at 21:43
Yes, you are correct. So a total of 6 numbers per year. Would a paired test be the correct one to use then since realistically I am using the same hospitals just different years or should I just use a two sample t-test? — Katie Yohnke, Jan 21 '13 at 21:57
I don't know how much power you'll have with so few data. Another issue is that you can't assess the reasonableness of the normality assumption, eg. You do want to respect the correlated nature of your data, though. So you may want to look into a non-parametric test for correlated data, eg the [Friedman test](http://en.wikipedia.org/wiki/Friedman_test). — gung - Reinstate Monica, Jan 21 '13 at 22:31
I'm a unfamiliar with the Friedman test. Is this similar to like the Mann-Whitney or Wilcoxon? — Katie Yohnke, Jan 22 '13 at 15:05
Yeah, the [Wilcoxon signed-rank test](http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is a non-parametric test for 2 dependent samples. The Friedman test is for multiple dependent samples. You have 6 hospitals, so I was thinking of that as multiple groups, but since you have just one number each, it might be better to use the hospitals only to establish the correspondence b/t before & after, & run the Wilcoxon instead. I wouldn't use the Mann-Whitney U test, though, as that's for independent samples. — gung - Reinstate Monica, Jan 23 '13 at 01:09

score 1 · Answer 1 · answered Nov 01 '18 at 12:42

You really have count data, so something like Poisson (or negative binomial) regression is appropriate. Also, you say you have actual counts and rates per 10000 patient days. That way you can calculate the actual number of patient days, which measures exposure, and can use Poisson rate regression. This is discussed multiple times here at Cross Validated, for instance here and here, more theoretical discussion here.

In R code would look like

mod  <-  glm(Ninfections ~ offset(log(exposure))+Ihospital+<other predictors>, family=poisson(link="log"), data=<your data frame>, ...)

How to find statistical significance between two years of data

1 Answers1