I got some inspiration from a question that was posted this morning.
Let's design an experiment where we have three factors, each with two levels (e.g. male/female). This gives a total of eight groups, our $x$ variables. In each group, we have a single subject, so eight total observations. On each subject, we measure some value $y$.
$$ y_i = \beta_0 + \beta_1x_{i1} + \beta_1x_{i2} + \epsilon_i$$
For whatever reason, I am highly interested in the bias and variance of the response variable when the predictors are $(0,1,0)$. My first thought would be to bootstrap.$^{\dagger}$
But the predictors are not random variables!
Would it be acceptable to bootstrap the 4-dimensional (each factor, plus the response variable) data set in order to calculate, say, the bias and variance?
I'll give some pseudo-code of what I have in mind.
biases = list/vector/whatever
preds = list/vector/whatever
for i in 1:1000 # Take 1000 bootstrap samples
idx = sample(1:8, 8, replace=T) # get the indices of the rows being resampled
df_boot = df[idx,] # take those indices
model = do_regression(df_boot) # fit the regression
pred = predict(model, x=(0,1,0)) # predict when x=(0,1,0)
biases[i] = pred - df[x=(0,1,0)] # calculate bias
preds[i] = pred # store prediction for variance calculation later
print(mean(biases))
print(var(preds))
Example Data Frame:
X1 | X2 | X3 | Y
================
0 | 0 | 0 | 4
0 | 0 | 1 | 2
0 | 1 | 0 | 3
0 | 1 | 1 | 5
1 | 0 | 0 | 3
1 | 0 | 1 | 3
1 | 1 | 0 | 7
1 | 1 | 1 | 8
$^{\dagger}$Small sample size, yada yada yada. Maybe I have eight-zillion subjects instead of just eight.