How to analyze if only N and means are available?

Question

I have data on following experiment:

Sample taken from each of 5 normals and 5 patients

All normal samples are pooled and mixed together. Similarly all patient samples are also mixed together.

So finally there are only 2 samples

Chemical analyzed in these 2 pooled samples

Measurement values are: normals: 0.55 ; patients: 0.75 units

Is this difference statistically significant? Which test should I apply here since these values represent means of normals and patients but there is no standard deviation available.

Edit: the measured value is a positive continuous numeric variable which may be in the range of 0-10

see the answer [here](http://stats.stackexchange.com/questions/1807/how-to-perform-students-t-test-having-only-sample-size-sample-average-and-popu/1836#1836) for what's likely to be the only fruitful approach. — Glen_b, May 15 '15 at 04:23
That approach is useful knowing, but looking at the values it won't help here. I suspect that the approach used here destroyed any chance of learning anything from this experiment. Either you or your collaborators need to think about how you will analyse the results before conducting an experiment. — Erik, May 15 '15 at 15:15

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

You don't say anything about what's being measured -- it might be relevant in this case. For example, if those numbers are percentages between 0 and 100, the standard deviation is bounded.

If there's any information on variability (that's likely to yield some bound, at least) it's likely to be useful.

In the absence of any such information, this answer outlines an approach that may provide the a potentially useful approach.

Which is to say, if you're interested in whether the population means differ, then if the CI for the mean difference between observations from each process excludes $\mu_{D0}$, the conclusion would be that the population mean difference is not $\mu_{D0}$.

Unfortunately, I think such intervals will always include the case $\mu_{D0}=0$; they can exclude values for $\mu_D$ that are sufficiently far from $\pm |X_D|$, but those will be further from $0$ than $|X_D|$ is.

Given that the variable is bounded, you can bound the standard deviation.

For example, if a variable can only take values between 0 and 10, its sd can't exceed $5$. However, because of Bessel's correction, sample sd could slightly larger ($5\sqrt{\frac{n}{n-1}}$). This occurs when half the value are at each end of the range. On the other hand -- $n$ here would be 5, so you can't actually get half of the values at each end; the maximum is therefore slightly smaller, at 5.477.

If you add in the known sample mean, you can bound the sample sd further. For example for patients the mean is 0.75 - you can work out the largest standard deviation compatible with that mean for a variable limited to be between 0 and 10. That would be if 4 observations were at 0 and 1 observation was at 3.75, giving a sd of 1.677

However working this way with the two samples doesn't necessarily then give the lowest bound on the t-statistic (though it should come close). To work that out you need to specify which t-statistic you'd be using, and then find the sample arrangements that satisfy the known means and minimize |t|. If that smallest $t$ was still significant, whatever sample arrangement you actually had must also be significant.

If you incorporate the known information that most values are between 0 and 2 (as long as that can be made precise for each group) you can bound t even further (make the smallest possible $|t|$ larger).

However, looking at roughly how small the sds would have to be to achieve significance, we can't get a bound anywhere near small enough to be any use.

For example if your significance level was 0.05, and you did an equal variance two sample t-test, you'd need a $|t|$ of at least 2.306. To get that, you'd need a denominator on the t-statistic of 0.08673, suggesting a pooled variance of 0.0188.

You just can't get down that far. So you can't hope to achieve a rejection with the information you presently have, making more a exact analysis pointless unless more information can be brought to bear on this problem.

The measured value is a positive continuous numeric variable which may be in the range of 0-10 . I have added this in the question. — rnso, May 15 '15 at 08:23
rnso 5 patients with an average of 75 on a variable restricted to lie between 0 and 10? Not possible. Even the *sum* being 75 isn't possible -- even if they all had 10, that couldn't make 75. (However, that's exactly the sort of information that would help, if it were believable) — Glen_b, May 15 '15 at 08:52
Sorry, it should have been 0.55 and 0.75 (although range is 0-10, most values are 0-2). — rnso, May 15 '15 at 13:30
A similar approach would be to use the scale information as well as the mean and N to estimate a range for the variability. It will always be possible to reject the null (since variability can be 0), but it can at least give a hint as to plausibility. The maximum SD is 1.39 if 4 normals are 0 and 1 are 2.75, and 4 patients are 0 and 1 are 3.75 (again, preserving the means). This gives a maximum SE of .44, so your t statistic ranges from .44 (if variance is maximized) to infinity (if variance is 0). — le_andrew, May 15 '15 at 14:05
rnso then you should be able to find a worst-case to get a bound on the test statistic, and hence the p-value. I'll come back later with some discussion on that. — Glen_b, May 15 '15 at 15:27

How to analyze if only N and means are available?

1 Answers1