Questions tagged [multiple-imputation]

Multiple imputation refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data

Multiple imputation refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data. While single imputation can produce consistent estimates of the parameters of interest, standard errors are difficult to pin down correctly. Rubin (1978) suggested to take several independent realizations of imputation mechanism, and provided the ways to combine the estimates to obtain the point estimates and standard errors valid under "proper imputation" assumptions.

Barnard, J. and X.-L. Meng (1999). Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research 8 (1), 17-36. http://dx.doi.org/10.1177/096228029900800103

Rubin, D.B. (1978). Multiple Imputations in Sample Surveys -- A Phenomenological Bayesian Approach to Nonresponse. The Proceedings of the Survey Research Methods Section of the American Statistical Association, 20-34.

Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association 91 (434), 473-489. http://dx.doi.org/10.1080/01621459.1996.1047690. This is the special issue of JASA devoted to multiple imputation.

Rubin, D. B. (2004). Multiple Imputation for Nonresponse in Surveys (Wiley Classics Library). Wiley-Interscience.

White, I. R., P. Royston, and A. M. Wood (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statstistics in Medicine. 30 (4), 377-399. http://dx.doi.org/10.1002/sim.4067

Related tags: missing-data.

464 questions
30
votes
3 answers

Imputation before or after splitting into train and test?

I have a data set with N ~ 5000 and about 1/2 missing on at least one important variable. The main analytic method will be Cox proportional hazards. I plan to use multiple imputation. I will also be splitting into a train and test set. Should I…
Peter Flom
  • 94,055
  • 35
  • 143
  • 276
22
votes
4 answers

Multiple imputation and model selection

Multiple imputation is fairly straightforward when you have an a priori linear model that you want to estimate. However, things seem to be a bit trickier when you actually want to do some model selection (e.g. find the "best" set of predictor…
D L Dahly
  • 3,663
  • 1
  • 24
  • 51
22
votes
1 answer

Multiple Imputation by Chained Equations (MICE) Explained

I have seen Multiple Imputation by Chained Equations (MICE) used as a missing data handling method. Is anyone able to provide a simple explanation of how MICE works?
20
votes
2 answers

Multiple imputation for outcome variables

I've got a dataset on agricultural trials. My response variable is a response ratio: log(treatment/control). I'm interested in what mediates the difference, so I'm running RE meta-regressions (unweighted, because is seems pretty clear that effect…
16
votes
1 answer

How do the number of imputations & the maximum iterations affect accuracy in multiple imputation?

The help page for MICE defines the function as: mice(data, m = 5, method = vector("character", length = ncol(data)), predictorMatrix = (1 - diag(1, ncol(data))), visitSequence = (1:ncol(data))[apply(is.na(data), 2, any)], form =…
119631
  • 335
  • 1
  • 2
  • 11
15
votes
2 answers

How to get pooled p-values on tests done in multiple imputed datasets?

Using Amelia in R, I obtained multiple imputed datasets. After that, I performed a repeated measures test in SPSS. Now, I want to pool test results. I know that I can use Rubin's rules (implemented through any multiple imputation package in R) to…
wisc88
  • 305
  • 1
  • 2
  • 9
15
votes
1 answer

Pooling calibration plots after multiple imputation

I would like advice on pooling the calibration plots/statistics after multiple imputation. In the setting of developing statistical models in order to predict a future event (e.g. using data from hospital records to predict post hospital discharge…
15
votes
2 answers

lmer with multiply imputed data

How can I get pooled random effects for lmer after multiple imputation? I am using mice to multiple impute a dataframe. And lme4 for a mixed model with random intercept and random slope. Pooling lmer goes fine, except that it doesn't pool the random…
13
votes
5 answers

Multiple imputation for missing values

I would like to use imputation for replacing missing values in my data set under certain constraints. For example, I'd like the imputed variable x1 to be greater or equal to the sum of my two other variables, say x2 and x3. I also want x3 to be…
rose
  • 493
  • 1
  • 4
  • 14
13
votes
2 answers

using neighbor information in imputing data or find off-data (in R)

I have dataset with assumption that nearest neighbors are best predictors. Just a perfect example of two-way gradient visualized- Suppose we have case where few values are missing, we can easily predict based on neighbors and trend.…
rdorlearn
  • 3,493
  • 6
  • 26
  • 29
12
votes
2 answers

How can I pool bootstrapped p-values across multiply imputed data sets?

I am concerned with the problem that I would like to bootstrap the p-value for an estimate of $\theta$ from multiply imputed (MI) data, but that it is unclear to me how to combine the p-values across MI sets. For MI data sets, the standard approach…
tomka
  • 5,874
  • 3
  • 30
  • 71
12
votes
1 answer

"the leading minor of order 1 is not positive definite" error using 2l.norm in mice

I am having a problem using the 2l.norm method of multilevel imputation in mice. Unfortunately I cannot post a reproducible example because of the size of my data - when I reduce the size, the problem vanishes. For a particular variable, mice…
Robert Long
  • 53,316
  • 10
  • 84
  • 148
12
votes
5 answers

How to perform imputation of values in very large number of data points?

I have a very large dataset and about 5% random values are missing. These variables are correlated with each other. The following example R dataset is just a toy example with dummy correlated data. set.seed(123) # matrix of X variable xmat <-…
John
  • 2,088
  • 6
  • 27
  • 37
11
votes
2 answers

In a longitudinal study, should I impute the outcome Y, measured at time 2, for individuals who were lost to follow-up?

I have repeat measures at 2 times points in a sample of people. There are 18k people at time 1, and 13k at time 2 (5000 lost to follow-up). I want to regress an outcome Y measured at time 2 (and the outcome is not able to be measured at time 1) on…
D L Dahly
  • 3,663
  • 1
  • 24
  • 51
10
votes
2 answers

Applying Rubin's rule for combining multiply imputed datasets

I am hoping to pool the results of a pretty basic set of analysis performed on a multiply imputed data (e.g. multiple regression, ANOVA). Multiple imputation and the analyses have been completed in SPSS but SPSS does not provide pooled results for a…
user81715
  • 159
  • 1
  • 1
  • 9
1
2 3
30 31