Analyzing multiply imupted data from Amelia in R: Why do results from zelig and mice differ?

Question

Motivated by my answer to this question, I played around with analyzing mulitply imputed data from the Amelia package in R. As I have explained in my answer, the multiply imputed datasets can be analyzed using the combined Zelig and mitools packages or using a combination of Zelig and mice.

Now, to me it seemed rather inconvenient to fit the linear model using zelig() as mice, too, provides the with.mids()-function to fit linear models to multiply imputed datasets. However, I found that the results differ depending on the function used for fitting. For the analysis using with.mids, I first had to circumvent a bug in the current mice-package by defining the following function, as has been explained in another question:

as.mids2 <- function(data2, .imp=1, .id=2){
  ini <- mice(data2[data2[, .imp] == 0, -c(.imp, .id)], maxit=0)
  names  <- names(ini$imp)
  if (!is.null(.id)){
    rownames(ini$data) <- data2[data2[, .imp] == 0, .id]
  }
  for (i in 1:length(names)){
    for(m in 1:(max(as.numeric(data2[, .imp])) - 1)){
      if(!is.null(ini$imp[[i]])){
        indic <- data2[, .imp] == m & is.na(data2[data2[, .imp]==0, names[i]])
        ini$imp[[names[i]]][m] <- data2[indic, names[i]]
      }
    }
  }
  return(ini)
}

Once I had done this, I used Zelig:

library("Amelia")
data(freetrade)
amelia.out <- amelia(freetrade, m = 15, ts = "year", cs = "country")

library("Zelig")
zelig.fit <- zelig(tariff ~ pop + gdp.pc + year + polity, data = amelia.out$imputations, model = "ls", cite = FALSE)
zelig.results <- lapply(zelig.fit, function(x) x$result)

library("mice")
zelig4mice <- as.mira(zelig.results)
zelig.mice.res <- summary(pool(zelig4mice, method = "rubin1987"))

Then I tried the same thing using only mice:

imp.data <- do.call("rbind", amelia.out$imputations)
imp.data <- rbind(freetrade, imp.data)
imp.data$.imp <- as.numeric(rep(c(0:15), each = nrow(freetrade)))
mice.data <- as.mids2(imp.data, .imp = ncol(imp.data), .id = NULL)

mice.fit <- with(mice.data, lm(tariff ~ polity + pop + gdp.pc + year))
mice.res <- summary(pool(mice.res2, method = "rubin1987"))

These are the results:

> zelig.mice.res
                  est       se     t    df Pr(>|t|)     lo 95     hi 95 nmis   fmi lambda
(Intercept)  3.18e+03 7.22e+02  4.41  45.9 6.20e-05  1.73e+03  4.63e+03   NA 0.571  0.552
pop          3.13e-08 5.59e-09  5.59 392.1 4.21e-08  2.03e-08  4.23e-08   NA 0.193  0.189
gdp.pc      -2.11e-03 5.53e-04 -3.81 329.4 1.64e-04 -3.20e-03 -1.02e-03   NA 0.211  0.206
year        -1.58e+00 3.63e-01 -4.37  45.9 7.11e-05 -2.31e+00 -8.54e-01   NA 0.570  0.552
polity       5.52e-01 3.16e-01  1.75  90.8 8.41e-02 -7.58e-02  1.18e+00   NA 0.406  0.393

> mice.res
                  est       se     t     df Pr(>|t|)     lo 95     hi 95 nmis    fmi lambda
(Intercept)  3.42e+03 8.87e+02  3.86   8.01 4.80e-03  1.38e+03  5.47e+03   NA 0.7599 0.7066
pop          3.20e-08 5.25e-09  6.10 504.30 2.10e-09  2.17e-08  4.24e-08    0 0.0927 0.0891
gdp.pc      -2.09e-03 5.31e-04 -3.93 189.23 1.19e-04 -3.13e-03 -1.04e-03    0 0.1543 0.1454
year        -1.70e+00 4.46e-01 -3.83   8.02 5.02e-03 -2.73e+00 -6.78e-01    0 0.7594 0.7061
polity       5.74e-01 3.60e-01  1.59  13.93 1.34e-01 -2.00e-01  1.35e+00    2 0.5907 0.5358

From these data it is apparent, that the linear models fit by the two methods differ and so do the determined degrees of freedom.

Why do these results differ? What is the correct analysis procedure?

It seems that as.mids2() does not produce a proper mids-object. It returns 5 imputations instead of 15. — Stef van Buuren, Oct 23 '13 at 13:14

score 3 · Accepted Answer · answered Oct 23 '13 at 17:48

Stef is right. An initial mice object is created in as.mids2() with m = 5 as a default. Even though the mids object is completely filled in, the with() function only uses the first five. This is because the number of imputations - corresponding to the largest value in .imp - is not changed in the returned object. The following code yields the correct mids object

as.mids2 <- function(data2, .imp=1, .id=2){
  ini <- mice(data2[data2[, .imp] == 0, -c(.imp, .id)], m = max(as.numeric(data2[, .imp])), maxit=0)
  names  <- names(ini$imp)
  if (!is.null(.id)){
    rownames(ini$data) <- data2[data2[, .imp] == 0, .id]
  }
  for (i in 1:length(names)){
    for(m in 1:(max(as.numeric(data2[, .imp])))){
      if(!is.null(ini$imp[[i]])){
        indic <- data2[, .imp] == m & is.na(data2[data2[, .imp]==0, names[i]])
        ini$imp[[names[i]]][m] <- data2[indic, names[i]]
      }
    } 
  }
  return(ini)
}

and yields the same results as Zelig given the active random seed.

> zelig.mice.res
                      est           se         t        df     Pr(>|t|)         lo 95
(Intercept)  3.118344e+03 7.031673e+02  4.434711  55.20608 4.437533e-05  1.709283e+03
pop          3.074658e-08 5.993667e-09  5.129844 211.31179 6.550041e-07  1.893154e-08
gdp.pc      -2.185839e-03 5.968324e-04 -3.662400 174.91116 3.308584e-04 -3.363759e-03
year        -1.551702e+00 3.535625e-01 -4.388762  55.26609 5.183456e-05 -2.260180e+00
polity       4.649357e-01 3.102389e-01  1.498638 125.11682 1.364865e-01 -1.490600e-01
                    hi 95 nmis       fmi    lambda
(Intercept)  4.527404e+03   NA 0.5206397 0.5035824
pop          4.256162e-08   NA 0.2643263 0.2573962
gdp.pc      -1.007919e-03   NA 0.2909757 0.2829145
year        -8.432231e-01   NA 0.5203580 0.5033089
polity       1.078931e+00   NA 0.3448966 0.3345077

> mice.res
                      est           se         t        df     Pr(>|t|)         lo 95
(Intercept)  3.118344e+03 7.031673e+02  4.434711  55.20608 4.437533e-05  1.709283e+03
pop          3.074658e-08 5.993667e-09  5.129844 211.31179 6.550041e-07  1.893154e-08
gdp.pc      -2.185839e-03 5.968324e-04 -3.662400 174.91116 3.308584e-04 -3.363759e-03
year        -1.551702e+00 3.535625e-01 -4.388762  55.26609 5.183456e-05 -2.260180e+00
polity       4.649357e-01 3.102389e-01  1.498638 125.11682 1.364865e-01 -1.490600e-01
                    hi 95 nmis       fmi    lambda
(Intercept)  4.527404e+03   NA 0.5206397 0.5035824
pop          4.256162e-08    0 0.2643263 0.2573962
gdp.pc      -1.007919e-03    0 0.2909757 0.2829145
year        -8.432231e-01    0 0.5203580 0.5033089
polity       1.078931e+00    2 0.3448966 0.3345077

Thanks a ton, Gerko! I was trying to figure this out yesterday, but was unable to fix it. Now, this should probably be migrated to StackOverflow. — crsh, Oct 24 '13 at 08:52

Analyzing multiply imupted data from Amelia in R: Why do results from zelig and mice differ?

1 Answers1

Linked