It seems to me that Ellis could be referring to as many as three distinct ideas here. First he says something about creating "simulated data generated by a model under the null hypothesis of no relation." I would call this a form of parametric bootstrapping. Then he says that this would be "probably based on resampling the times between each event (eg between each yawn) to create a new set of time stamps for hypothetical null model events." Which, let's just be clear here, to do this is not to "create simulated data." We are instead, if I understand correctly, resampling from our actually observed data. This latter procedure is either a permutation test or nonparametric bootstrapping, depending on how the resampling takes place.
I guess I should say a few more words about parametric bootstrapping, permutation tests, and nonparametric bootstrapping.
Usually parametric bootstrapping is done by simulating based on the actually estimated model, and not based on a hypothetical model that is just like the estimated model except the null hypothesis is assumed true, as Ellis seems to suggest at first. By "simulate data" I mean something like as an example: my model states that my data come from two groups, each with a normal distribution, with means $\mu_1$ and $\mu_2$, respectively, and standard deviation $\sigma$, so I will generate many sets of data that satisfy this and use the distribution of test statistics computed from each of these simulated datasets as my sampling distribution. Note, I am creating this data using something like rnorm()
in R
, not directly using my observed data. Now, one could certainly do this procedure and get a sort of sampling distribution under the null hypothesis of, say, no difference in group means--we would just assume $\mu_1=\mu_2$ in all the simulated datasets, contrary to what we actually observed--and in this way we get a bootstrapped p-value (rather than a bootstrapped confidence interval, which is what the former/traditional method affords you). Again, I would just call this a way of obtaining a p-value via parametric bootstrapping.
A permutation test, on the other hand, involves shuffling your observed data over and over in a way that would be consistent with the null hypothesis. So for example, if the null hypothesis implies that group assignment makes no difference in terms of the group means, you can randomly shuffle the group labels among all your observations many many times and see what mean differences you would get for all possible ways of shuffling in this way. And then you would see where within the distribution of test statistics computed from these shuffled datasets does your actual observed statistic lie. Note that there is a finite (but usually large) number of ways that you can shuffle your actually observed data.
Finally, nonparametric bootstrapping is very similar to the permutation test, but we resample the observed data with replacement to try to get closer to an infinite "population" of values that our data might have been drawn from. There are many, many more ways to resample from your data with replacement than there are to shuffle your data (although it is technically finite in practice as well). Again, similar to parametric bootstrapping, this is usually done not under the null hypothesis, but under the model implied by the observed data, yielding confidence intervals around the observed test statistics, not p-values. But one could certainly imagine doing this under the null hypothesis like Ellis suggests and obtaining p-values in this way. As an example of nonparametric bootstrapping here (in the traditional fashion, i.e., not under the null hypothesis) using the same difference-in-group-means example I used in the parametric bootstrapping paragraph, to do this we would resample with replacement the observations within each group many times but not mixing observations between groups (unlike in the permutation test), and build up the sampling distribution of group mean differences that we get this way.