Motivated by my answer to this question, I played around with analyzing mulitply imputed data from the Amelia
package in R. As I have explained in my answer, the multiply imputed datasets can be analyzed using the combined Zelig
and mitools
packages or using a combination of Zelig
and mice
.
Now, to me it seemed rather inconvenient to fit the linear model using zelig()
as mice
, too, provides the with.mids()
-function to fit linear models to multiply imputed datasets. However, I found that the results differ depending on the function used for fitting. For the analysis using with.mids
, I first had to circumvent a bug in the current mice
-package by defining the following function, as has been explained in another question:
as.mids2 <- function(data2, .imp=1, .id=2){
ini <- mice(data2[data2[, .imp] == 0, -c(.imp, .id)], maxit=0)
names <- names(ini$imp)
if (!is.null(.id)){
rownames(ini$data) <- data2[data2[, .imp] == 0, .id]
}
for (i in 1:length(names)){
for(m in 1:(max(as.numeric(data2[, .imp])) - 1)){
if(!is.null(ini$imp[[i]])){
indic <- data2[, .imp] == m & is.na(data2[data2[, .imp]==0, names[i]])
ini$imp[[names[i]]][m] <- data2[indic, names[i]]
}
}
}
return(ini)
}
Once I had done this, I used Zelig
:
library("Amelia")
data(freetrade)
amelia.out <- amelia(freetrade, m = 15, ts = "year", cs = "country")
library("Zelig")
zelig.fit <- zelig(tariff ~ pop + gdp.pc + year + polity, data = amelia.out$imputations, model = "ls", cite = FALSE)
zelig.results <- lapply(zelig.fit, function(x) x$result)
library("mice")
zelig4mice <- as.mira(zelig.results)
zelig.mice.res <- summary(pool(zelig4mice, method = "rubin1987"))
Then I tried the same thing using only mice
:
imp.data <- do.call("rbind", amelia.out$imputations)
imp.data <- rbind(freetrade, imp.data)
imp.data$.imp <- as.numeric(rep(c(0:15), each = nrow(freetrade)))
mice.data <- as.mids2(imp.data, .imp = ncol(imp.data), .id = NULL)
mice.fit <- with(mice.data, lm(tariff ~ polity + pop + gdp.pc + year))
mice.res <- summary(pool(mice.res2, method = "rubin1987"))
These are the results:
> zelig.mice.res
est se t df Pr(>|t|) lo 95 hi 95 nmis fmi lambda
(Intercept) 3.18e+03 7.22e+02 4.41 45.9 6.20e-05 1.73e+03 4.63e+03 NA 0.571 0.552
pop 3.13e-08 5.59e-09 5.59 392.1 4.21e-08 2.03e-08 4.23e-08 NA 0.193 0.189
gdp.pc -2.11e-03 5.53e-04 -3.81 329.4 1.64e-04 -3.20e-03 -1.02e-03 NA 0.211 0.206
year -1.58e+00 3.63e-01 -4.37 45.9 7.11e-05 -2.31e+00 -8.54e-01 NA 0.570 0.552
polity 5.52e-01 3.16e-01 1.75 90.8 8.41e-02 -7.58e-02 1.18e+00 NA 0.406 0.393
> mice.res
est se t df Pr(>|t|) lo 95 hi 95 nmis fmi lambda
(Intercept) 3.42e+03 8.87e+02 3.86 8.01 4.80e-03 1.38e+03 5.47e+03 NA 0.7599 0.7066
pop 3.20e-08 5.25e-09 6.10 504.30 2.10e-09 2.17e-08 4.24e-08 0 0.0927 0.0891
gdp.pc -2.09e-03 5.31e-04 -3.93 189.23 1.19e-04 -3.13e-03 -1.04e-03 0 0.1543 0.1454
year -1.70e+00 4.46e-01 -3.83 8.02 5.02e-03 -2.73e+00 -6.78e-01 0 0.7594 0.7061
polity 5.74e-01 3.60e-01 1.59 13.93 1.34e-01 -2.00e-01 1.35e+00 2 0.5907 0.5358
From these data it is apparent, that the linear models fit by the two methods differ and so do the determined degrees of freedom.
Why do these results differ? What is the correct analysis procedure?