Resampling, binomial, z- and t-test: help with real data

Question

I am trying to understand how I can use resampling techniques to compliment my pre-planned analyses. This is not homework. I have a 5 sided die. 30 subjects call a number (1-5) and then roll the die. If it matches it's a hit, if not it's a miss. Each subject does this 25 times.

N is the the number of trials (=25) and p is the probability of getting it correct (=.2) then the population value (mu) of the mean number correct is n*p=5. The population standard deviation, sigma, is square-root(n*p*[1-p]), which is 2.

The experimental hypothesis (H1) is that subjects in this study will score above chance (above mu). The null hypothesis (H0) assumes a binomial distribution for each subject (they will score at mu).

[Please don't get too worried about why I am doing this. If it helps you to understand the problem then you can think of it as an ESP test (and therefore I am testing the ability of subjects to score above mu). Also if it helps, imagine that the task is a virtual reality die throwing task, where the virtual 5-sided die performs according to chance. There can be no bias from an imperfect die because the die is virtual.]

Okay. So before I conducted the "experiment" I had planned to compare the 30 subjects score with a one-sample t-test (comparing it to the null that mu=5). Then I discovered that the one-sample z-test was a more powerful test given what we know about the null hypothesis. Okay.

Here is a simulation of my data in R:

binom.samp1 <- as.data.frame(matrix(rbinom(30*1, size=25, prob=0.2), ncol=1))

Now R has a binom.test function, which gives the p-value regarding the number of successes over the number of trials. For my collected data (not the simulated data given):

>binom.test(174, 750, 1/5, alternative="g")
number of successes = 174, number of trials = 750, p-value = 0.01722

Now the one-sample t-test that I had originally planned to use (mainly because I'd never heard of the alternatives - should've paid more attention in higher statistics):

>t.test(binom.samp1-5, alternative="g")
t = 1.7647, df = 29, p-value = 0.04407

and for completedness sake: the one-sample z-test (BSDA package):

>z.test(binom.samp1, mu=5, sigma.x=2, alternative="g")
z = 2.1909, p-value = 0.01423

So. My first question is, am I right in concluding that the binom.test is the correct test given the data and hypothesis? In other words, does t approximate to z which approximates to the exact binom.test (Bernoulli trial)?

Now my second question relates to the resampling methods. I have several books by Philip Good and I've read plenty on permutation and bootstrapping. I was just going to use the one-sample permutation test given in the DAAG package:

>onet.permutation(binom.samp1-5)
0.114

And the perm.test function in the exactRankTests package gives this:

>perm.test(binom.samp1, mu=5, alternative="g", exact=TRUE)
T = 42, p-value = 0.05113

I have the feeling that what I want to do is conduct a one-sample permutation binom.test. The only way I can see it working is if I take a subset of the 30 subjects and calculate the binom.test and then repeat it for a large number of N. Does this sound like a reasonable idea?

Finally, I did repeat this experiment with the same equipment (the 5 sided die) but a larger sample size (50 people) and I got exactly what I expected. My understanding is that the two studies are like a Galton box that hasn't filled up yet. The 30 n experiment has a bit of a skew, but had it been run for longer it would have filled up to the binomial. Is this all gibberish?

>binom.test(231, 1250, 1/5, alternative="g")
number of successes = 231, number of trials = 1250, p-value = 0.917

>t.test(binom.samp2-5)
t = -1.2249, df = 49, p-value = 0.2265

>z.test(binom.samp2, mu=5, sigma.x=2)
z = -1.3435, p-value = 0.1791

>onet.permutation(binom.samp2-5)
0.237

>perm.test(binom.samp2, mu=5, alternative="g", exact=TRUE)
T = 35, p-value = 0.8991

What's the purpose of the experiment? The answer to that determines the answers to all your questions. — whuber, Nov 03 '10 at 16:43
The purpose of the "experiment" was to test the claim that subjects can score above chance. Hence H1. — Frank Zafka, Nov 03 '10 at 16:47
Just to be clear (since I have no idea what "H1" means): are you testing the claim that the *average score* in a *population* is better than 1/5 or that there exist *particular subjects* who have a tendency to score better than 1/5? — whuber, Nov 03 '10 at 19:32
The experimental hypothesis (H1) is that subjects in the study will score above chance. The null hypothesis (H0) assumes a binomial distribution for each subject. I tried to make that clear in the question. — Frank Zafka, Nov 03 '10 at 19:38
@RSoul. As whuber as has already said, is your $H_o$: mean score = 1/5 and $H_1$: is mean score $\ge$ 1/5 ? — suncoolsu, Nov 03 '10 at 20:43
Okay. That's a good question. I'll just put down my copy of Feller. — Frank Zafka, Nov 03 '10 at 20:48
There was a stray zero in the question which may have confused things. I am testing the claim that subjects will score significantly above mu. — Frank Zafka, Nov 03 '10 at 20:55
@RSoul Sorry to keep asking questions, but I do have a lot. Is this really a die or is it a pseudorandom number generator? (Charles Tart famously "demonstrated" ESP by virtue of a flawed RNG!) If it's a die, is it just one or are there several copies of it? (With a physical die, it's doubtful all outcomes have probability 1/5, so your experiment needs to pay attention to learning effects and nonuniformity of outcomes.) What is your justification for looking solely at the one-sided alternative? How are the subjects selected for testing? (Answers to these can guide resampling methods.) — whuber, Nov 03 '10 at 21:10
The real experiment is not a dice experiment. It is computer based, but tbh I didn't want to go into the details. The information given should really be more than enough. I was essentially testing chance and finding chance. I realise it seems an odd thing to do. — Frank Zafka, Nov 03 '10 at 21:19

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

Answer #1: binom.test is in some ways a "more correct" test because it doesn't assume normality; yes - you'll get more power out of the normality assumption, and it might be reasonable - but to any extent you violate the assumptions of the test you may increase your type-I error rate.

Explanation #1: Though with a high number of trials results from a binomial data source approaches normality, it isn't perfectly normal. To convince yourself about this you can use a Shapiro-Wilk test for normality, e.g. shapiro.test(rbinom(30,25,.2)) [where 30 is your number of participants, 25 is your number of trials, and .2 is the underlying probability of success]. You'll note with random data sometimes normality is significantly violated and sometimes it isn't. Your own data will tell the story you need to know. But, in general, because it is possible to violate normality under these circumstances, I prefer to avoid making the assumption.

Answer #2: See my answer elsewhere. What you are proposing sounds like a bootstrap of permutation test results. Don't do that; it is odd and you won't be able to publish it. The binom.test is sufficient for your data and hypothesis. I'd suggest that you don't confuse matters by doing a permutation test or parametric test where the binomial distribution is clearly the best fit for the process generating your data. Also, it is confusing that in one case you'd be willing to make assumptions (e.g. normality) but elsewhere and to use a permutation test. The strength of permutation tests is that they don't tend to make as many assumptions.

Answer #3: It isn't gibberish. You might want to consider breaking your questions down in the future. It is a bit much for a single question here. In short, standard statistical approaches can lead to a failure to replicate in the way you describe because either 1) the results from experiment 1 were due to a Type I error or 2) the results from experiment 2 were due to a Type II error. Does N = 50 provide enough power that you can be confident in the results?

That all sounds very reasonable. Again, I will go away and think about it. Many thanks for the consideration and advice. — Frank Zafka, Nov 08 '10 at 17:13

score 1 · Answer 2 · answered Nov 06 '10 at 10:28

1

From Sidney Siegel:

With a large enough sample the binomial distribution tends toward the normal distribution. A rule of thumb is that NPQ must be equal to at least 9.

I believe in this case it is 750 independent observations * 1/5 *4/5 =120. Thus the parametric one-sample t-test is appropriate and the most powerful test. And so I'll just stick with the one-sample permutation t-test (DAAG package) as the resampling method to compare with the table-lookup test.

answered Nov 06 '10 at 10:28

Frank Zafka

87
4
16

This solution ignores the structure of the experiment: it treats all subjects as identical. If there is any possibility they are not, then at a minimum you should test for differences among their means. – whuber Nov 06 '10 at 15:22
By all means suggest an alternative but 1) given the hypothesis I think that it is not unreasonable to expect all subjects to score at chance (the null) and 2) I reread the paper I used as the inspiration and they used a one-sample t-test in this way. – Frank Zafka Nov 06 '10 at 16:00

Resampling, binomial, z- and t-test: help with real data

2 Answers2

Linked