6

Consider an experiment with $m$ subjects and $n$ words. Every subject rates every word, producing an $(m \times n)$ data matrix $\hat{X}$. I am interested in forming a confidence interval around $f(\hat{X})$ where $f$ is some arbitrary scalar function. I'd like this confidence interval to take into account both the sampling error that comes from the random choice of subjects and the sampling error that comes from the random choice of the words.

What I'm inclined to do is to bootstrap both subjects and words: On each bootstrap iteration, randomly resample the words (with replacement) and the same for subjects (using the same set of resampled words for all resampled subjects). This generates a resampled data matrix $X^\ast_i$, and a bootstrap estimate $f(X^\ast_i)$. The vector of bootstrap estimates (e.g., 10,000 of them) is then used to form a bootstrap confidence interval (e.g., percentile bootstrap) as if these estimates were standard, single-factor bootstrap estimates.

  1. Is this legit or am I violating some implicit assumption of the bootstrap?
  2. Is there a more principled way of dealing with this problem?
  3. Is there any R package that implements such a procedure? I can easily write the resampling code but calculating advanced bootstrap intervals (e.g., BCa) isn't trivial. boot seems to assume a single random effect.
Trisoloriansunscreen
  • 1,669
  • 12
  • 25
  • 1
    Do you have any fixed effects? – Dave Oct 04 '19 at 15:58
  • $f(\cdot)$ measures the performance difference between two models predicting the ratings. You can think about the two models as a fixed effect. – Trisoloriansunscreen Oct 04 '19 at 16:21
  • Also, what you’re proposing seems like it would be a permutation. Maybe I’m reading it wrong, but it sure looks like you could wind up with Ronald evaluating “fissure” when only other subjects evaluated that word. – Dave Oct 04 '19 at 16:46
  • In the experiment, each subject evaluated all of the words, and the bootstrap resampling respects that (for resampled words and subjects). You can think about $X^\star$ as produced by resampling the rows and then resampling the columns (or vice versa) of $X$. Both matrices are dense (i.e., no missing values). – Trisoloriansunscreen Oct 04 '19 at 16:56
  • 1
    It seems you're assuming that ratings given to each word are independent. But, is that really the case? It seems possible that subjects might rate words differently depending on the order they're presented in, the overall collection of other words, the history of recently presented words, etc. – user20160 Oct 04 '19 at 18:12
  • @user20160 That's true, the bootstrap does neglect the order (although I'm willing to live with that). What would be your alternative? – Trisoloriansunscreen Oct 04 '19 at 19:38
  • 1
    Have you looked into the different options available through the bootMer() function? https://www.rdocumentation.org/packages/lme4/versions/1.1-21/topics/bootMer – Matt Barstead Oct 06 '19 at 12:29
  • 1
    The procedure looks legit to me. – amoeba Oct 09 '19 at 11:05

1 Answers1

2

I think in this case it is recommended to do a parametric bootstrap: the mixed effect model gives you an estimate of the variance of the effects of words and subjects, so you can generate new random deviates from their distribution (thus without actually resampling the estimated values). It is not difficult to write the code yourself, but if you used the lme4 package to estimate the model then I think you should be able to do it via the function bootMer. If I understood correctly your problem, you could just write a wrapper function that computes predicted values $X^*$ from the bootstrapped model and calculate the function $f(X^*)$; and pass it to bootMer. Once you have a boostrapped distribution for $f(X^*)$ you can use any methods to calculate a confidence interval (e.g. percentile). If you are interested in BCa intervals I have some code that calculate that here (it is part of a small package that contains my frequently used custom functions available here)

matteo
  • 2,631
  • 11
  • 19