0

I got some inspiration from a question that was posted this morning.

Let's design an experiment where we have three factors, each with two levels (e.g. male/female). This gives a total of eight groups, our $x$ variables. In each group, we have a single subject, so eight total observations. On each subject, we measure some value $y$.

$$ y_i = \beta_0 + \beta_1x_{i1} + \beta_1x_{i2} + \epsilon_i$$

For whatever reason, I am highly interested in the bias and variance of the response variable when the predictors are $(0,1,0)$. My first thought would be to bootstrap.$^{\dagger}$

But the predictors are not random variables!

Would it be acceptable to bootstrap the 4-dimensional (each factor, plus the response variable) data set in order to calculate, say, the bias and variance?

I'll give some pseudo-code of what I have in mind.

biases = list/vector/whatever
preds = list/vector/whatever
for i in 1:1000                       # Take 1000 bootstrap samples
    idx = sample(1:8, 8, replace=T)   # get the indices of the rows being resampled
    df_boot = df[idx,]                # take those indices
    model = do_regression(df_boot)    # fit the regression
    pred = predict(model, x=(0,1,0))  # predict when x=(0,1,0)
    biases[i] = pred - df[x=(0,1,0)]  # calculate bias
    preds[i] = pred                   # store prediction for variance calculation later
print(mean(biases))
print(var(preds))

Example Data Frame:

X1 | X2 | X3 | Y
================
 0 |  0 |  0 | 4
 0 |  0 |  1 | 2
 0 |  1 |  0 | 3
 0 |  1 |  1 | 5
 1 |  0 |  0 | 3
 1 |  0 |  1 | 3
 1 |  1 |  0 | 7
 1 |  1 |  1 | 8

$^{\dagger}$Small sample size, yada yada yada. Maybe I have eight-zillion subjects instead of just eight.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • 1
    One bootstraps the *residuals* while leaving the predictors untouched. – whuber Aug 24 '20 at 16:59
  • @whuber And then I would calculate the bias of the residuals and the variance of the residuals to get my estimates for that particular $y$? – Dave Aug 24 '20 at 17:08
  • I don't follow you, because this doesn't sound like bootstrapping, because there's no sense in which you are repeating the original experiment. It's more akin to assessing the empirical distribution of the residuals. – whuber Aug 24 '20 at 17:40

0 Answers0