I just started using multiple imputation in R using the mice
package. I want to conduct an independent t-test on the imputed data.
Here's a minimal working example:
library(mice)
# Create sample data frame
set.seed(42)
data <- data.frame(subject_id = 1:100,
group_var = rep(c("test", "control"), times = 50),
dep_var = rnorm(100, mean = 5, sd = 1),
aux_var = rnorm(100, mean = 20, sd = 4))
# Create dataset with missings
na_data <- data
na_data$dep_var[sample.int(100, 23)] <- NA
# Apply multiple imputation
imp_data <- mice(na_data, seed = 42, predictorMatrix = matrix(c(0, 0, 0, 0,
0, 0, 0, 0,
0, 0, 0, 1,
0, 0, 0, 0),
ncol = 4))
# Fit models to imputed dataset
fit = with(imp_data, lm(dep_var ~ group_var))
# Pool models and print summary
pooled_fit <- pool(fit)
summary(pooled_fit)
# compare to lm and t-test with full dataset
summary(lm(dep_var ~ group_var, data = data))
t.test(dep_var ~ group_var, data = data)
I'm not quite sure if the call to lm
actually achieves what I'm trying to do (i.e. conduct an independent t-test).
Also, it would be nice to have a measure of the "pooled effect size" (in this case Cohen's d). For a single model, Cohen's d can be calculated using effsize::cohen.d
.
Any help on this would be great! Thank you.