0

The question is - formally speaking, what should the minimum sample size be in order to trust statistical metrics like p-value and power?

Let's say we have population A and B each having values between 0 and 1 and we want to know if there is any significance difference between them.

My typical approach is to use the p-value and power to decide. However, it seems this only works with "sufficient" sized populations but I am failing to formally define what constitutes such populations.

For example, let's say the mean value for A is .5, with 1000 samples, and standard deviation of .02 and the mean value for B is 1.0 with 2 samples. Plug this in your favorite p-value calculator or formula and they will claim the results are significant even though we know that there's a 25% chance of getting B's result using the null hypothesis.

With Za = 2.3263 for 1% tile

power zscore = 2.3263 - (1 - 0.5)/( .02 / (2^(1/2))) = -33

And

p-value zscore = (1 - .05) / ((.5 * .5 / 1000)^(1/2)) = 31

We intuitively and mathematically know that 2 samples shouldn't be enough to conclude anything about the result, so how many samples should be the minimum?

If you propose some fixed minimum "ie. 10" - why? For example, if the probability was 0.001% in population and both samples in population B were 1, I might think intuitively and mathemtically there is some significance there.

John K
  • 113
  • 2
  • 1 is the minimum in some instances, including the textbook Normally distributed random variable. See https://stats.stackexchange.com/a/1836/919. You can **trust** the p-values of these tests because they are mathematically correct. The issue may be that you have a more complicated concept of "trust" that might extend to needing to check your probability model. But if you would like any definite answer, you will need to clarify--and, if possible, quantify--your concept of "trust" in any statistical result. – whuber Apr 14 '20 at 03:06
  • In the above example, we know that if you flip a coin twice, you have a 25% chance to get two heads. That means the null hypothesis has at least at 25% chance of being correct. If p-value and power are supposed to be measures of significance, then why don't they represent the low significance in this case? – John K Apr 14 '20 at 12:47
  • 1
    "The null hypothesis has at least a 25% chance of being correct" is not a relevant statement. All one can derive from the theory are statements about the probabilities of *data* conditional on the hypothesis; null hypothesis testing concludes nothing whatever about the chance of any hypothesis. That confusion might lie at the heart of your question. – whuber Apr 14 '20 at 12:49

0 Answers0