Significance test for non-normal population?

Question

I'm a journalist, and I am trying to work out whether hospitals that are in political districts with low majorities (i.e. where the political representative is fighting hard for his or her seat) are more likely to get extra funding.

In other words, do hospitals that have received extra funding tend to be in districts with relatively low majorities? I'm not sure which significance test is most appropriate here.

Here are my numbers:

There are 600 districts overall. The average political majority is 18.5%, with a s.d. of 12.1%.
There are 1588 hospitals overall. Considering each hospital as one member of the population, and looking at the district it is in, the average majority is 18.6% with a s.d. of 12.6%.
There are 203 hospitals that have received extra funding. Considering each hospital as one element in the population, and looking at the district it is in, the average majority is 16.6% with a s.d. of 11.5%.

So I an see that the hospitals are in districts with lower majorities on average, but I'm not sure whether this counts as significant.

(I'm pretty sure there is something going on! I also have the stats for each individual hospital, if that helps. It's a long time since I did statistics at college and I have forgotten whether I should be looking at means or at something else.)

What complicates this for me is that the distribution of majorities probably isn't Normal, because I'm looking at the modulo majority, and am not concerned with which party has the majority.

Any thoughts on how to assess the significance of this finding?

It sounds like you should start with playing around with approaches to displaying the data before launching into a significance test. I can imagine that the effect of being close to a marginal electorate might be nearly as important as being within the marginal electorate, given that patients can cross electoral boundaries. See if you can use a spatial display of the data. Sometimes a good display of the data is better than a significance test. — Michael Lew, Jun 17 '12 at 04:12
I would be reluctant to go with the modulo majority rather than an index which contains the sign of the majority (relative to the team which won government). Try not to throw away information that is within the data before you are sure that it is not important. — Michael Lew, Jun 17 '12 at 04:13

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

There are a lot of sophisticated methods that could be applied to this problem, taking into account the probability distribution of majorities, clustering by district, controlling for other district-level variables etc. Generally I would like to use a trimmed mean to compare this sort of thing; and to get a good view of the statistical properties of an estimate of the trimmed mean you would use a bootstrap. All this I imagine is beyond your resources and timeframe unless you have statistical consultant on tap.

This is also a good example of applying statistic inference techniques that were designed for samples to a whole population. See some discussion on previous questions - here and here. My view is strongly that it is useful and policy-relevant to treat such a situation as though the census of hospitals and their grants were a random sample from a hypothetical super-population and draw conclusions on whether there is statistically significant evidence the data-generating process has produced something different from what woudl be expected under a null hypothsis (in this case, the null hypothesis would be 'no relationship between majority size and funding behavior).

Putting all that aside however, a basic pragmatic approach to statistics would note several things:

While the distribution of majorities is certainly not normal, there are limits on how badly skewed it could be. After all, they can't possibly get bigger than 100.
Your sub-population has 203 members, which is normally enough for the central limit theorem to kick in with a good range of distributions, which means that although the original population of distributions is not normally distributed, your estimates of the mean of majorities in your subpopulation is going to be close to normally distributed
the standard deviations for the populations you quote can be converted into estimates of the standard deviation of your estimated mean majority in each population by dividing them by the square root of the relevant sample size
those standard deviations of the estimated mean majority can be combined into an estimate of the standard deviation of your estimate of the difference between the two populations - let's call this the standard error of your estimated difference - as follows:

$se_{diff} = \sqrt{\frac{sd_{pop1}^2}{n_{pop1}}+\frac{sd_{pop2}^2}{n_{pop2}}}$

your estimate of the difference in mean between the two populations will be approximately normally distributed with the above standard error as its standard deviation; so you can multiply the standard error by 1.96 to give the radius of an approximate 95 percent confidence interval for the difference in mean majority between the two populations.

My calculations suggest this gives you an estimate of the difference in mean majority for the districts of hospitals that received grants in the interval of (0.3,3.7) percent, which does not include zero, so this crude first effort certainly suggests that there is something going on here. However, the interval nearly includes zero, and in any account the smaller majority in districts where hospitals received funds is not that much smaller, that I'd be careful before drawing too many conclusions from this.

To get a better answer you would need to bring in some of the more sophisticated techniques mentioned earlier.

This is a great answer, thank you. I'm just confused by the last bullet point: why the mention of zero? The difference between the means is 2 percent (18.6% in the wider sample, 16.6% in the sample with funding). — statsapprentice, Jun 17 '12 at 12:23
I've tried calculating the standard error, and I get quite a different answer: root ((11.5*11.5)/203)+((12.6*12.6)/1588) is 0.867, suggesting the 95% confidence interval should be 18.6% plus or minus 0.86. Am I doing it wrong? — statsapprentice, Jun 17 '12 at 12:36
Also: I've taken note of your comments about more sophisticated techniques, thank you. — statsapprentice, Jun 17 '12 at 12:40
@statsapprentice - you need to take the square root of 0.867 to get your standard error; and then you also need to multiply that by 1.96 to turn it into the radius of a confidence interval. And that is for the difference between the two, so you will get 2% +/- 1.8% or so. The reason zero is particularly interesting is because a difference of zero would mean there is no difference between grant-receiving and no-grant hospitals. As 2.0+/-1.8 is definitely bigger than zero you can in theory dismiss that - but only just. — Peter Ellis, Jun 17 '12 at 19:34
I think @dsign's suggestion of looking at the correlation of majority size with extra-funding is a good one too - uses more of the information in majority size and its possible relationship, rather than just reducing it to mean. ie as it gets closer to zero, are you even more likely to get funding. — Peter Ellis, Jun 17 '12 at 19:36

score 1 · Answer 2 · answered Jun 17 '12 at 07:11

If you have the number of hospitals per district that received extra-funding, and the majority size of those districts, you can check if both are correlated using Pearson correlation, and then check the significance of said correlation using a permutation test (which would handle non-normality). It sounds more complex of what is, really, so, I have prepared a quick example of how to do it in Excel.

Good luck!

score 0 · Answer 3 · answered Jun 17 '12 at 01:34

0

When dealing with two populations that are not normal you can look for differences in distributions using the Wilcoxon rank sum test.

answered Jun 17 '12 at 01:34

Michael R. Chernick

39,640
28
74
143

Significance test for non-normal population?

3 Answers3

Linked