Questions tagged [multiple-imputation]

Multiple imputation refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data

Multiple imputation refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data. While single imputation can produce consistent estimates of the parameters of interest, standard errors are difficult to pin down correctly. Rubin (1978) suggested to take several independent realizations of imputation mechanism, and provided the ways to combine the estimates to obtain the point estimates and standard errors valid under "proper imputation" assumptions.

Barnard, J. and X.-L. Meng (1999). Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research 8 (1), 17-36. http://dx.doi.org/10.1177/096228029900800103

Rubin, D.B. (1978). Multiple Imputations in Sample Surveys -- A Phenomenological Bayesian Approach to Nonresponse. The Proceedings of the Survey Research Methods Section of the American Statistical Association, 20-34.

Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association 91 (434), 473-489. http://dx.doi.org/10.1080/01621459.1996.1047690. This is the special issue of JASA devoted to multiple imputation.

Rubin, D. B. (2004). Multiple Imputation for Nonresponse in Surveys (Wiley Classics Library). Wiley-Interscience.

White, I. R., P. Royston, and A. M. Wood (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statstistics in Medicine. 30 (4), 377-399. http://dx.doi.org/10.1002/sim.4067

Related tags: missing-data.

464 questions

votes

3 answers

Imputation before or after splitting into train and test?

I have a data set with N ~ 5000 and about 1/2 missing on at least one important variable. The main analytic method will be Cox proportional hazards. I plan to use multiple imputation. I will also be splitting into a train and test set. Should I…

asked Apr 24 '14 at 18:55

Peter Flom

94,055
35
143
276

votes

4 answers

Multiple imputation and model selection

Multiple imputation is fairly straightforward when you have an a priori linear model that you want to estimate. However, things seem to be a bit trickier when you actually want to do some model selection (e.g. find the "best" set of predictor…

multiple-regression multiple-imputation

asked Dec 30 '12 at 19:48

D L Dahly

3,663
1
24
51

votes

1 answer

Multiple Imputation by Chained Equations (MICE) Explained

I have seen Multiple Imputation by Chained Equations (MICE) used as a missing data handling method. Is anyone able to provide a simple explanation of how MICE works?

regression machine-learning classification missing-data multiple-imputation

asked Aug 10 '19 at 07:27

Mike Tauber

votes

2 answers

Multiple imputation for outcome variables

I've got a dataset on agricultural trials. My response variable is a response ratio: log(treatment/control). I'm interested in what mediates the difference, so I'm running RE meta-regressions (unweighted, because is seems pretty clear that effect…

missing-data meta-analysis multiple-imputation meta-regression

asked Dec 19 '12 at 09:03

generic_user

11,981
8
40
63

votes

1 answer

How do the number of imputations & the maximum iterations affect accuracy in multiple imputation?

The help page for MICE defines the function as: mice(data, m = 5, method = vector("character", length = ncol(data)), predictorMatrix = (1 - diag(1, ncol(data))), visitSequence = (1:ncol(data))[apply(is.na(data), 2, any)], form =…

r missing-data data-imputation multiple-imputation mice

asked Jun 15 '16 at 14:01

119631

votes

2 answers

How to get pooled p-values on tests done in multiple imputed datasets?

Using Amelia in R, I obtained multiple imputed datasets. After that, I performed a repeated measures test in SPSS. Now, I want to pool test results. I know that I can use Rubin's rules (implemented through any multiple imputation package in R) to…

r spss p-value multiple-imputation pooling

asked Sep 04 '13 at 01:06

wisc88

votes

1 answer

Pooling calibration plots after multiple imputation

I would like advice on pooling the calibration plots/statistics after multiple imputation. In the setting of developing statistical models in order to predict a future event (e.g. using data from hospital records to predict post hospital discharge…

data-visualization data-imputation multiple-imputation pooling calibration

asked Mar 07 '17 at 15:30

IWS

2,554
13
30

votes

2 answers

lmer with multiply imputed data

How can I get pooled random effects for lmer after multiple imputation? I am using mice to multiple impute a dataframe. And lme4 for a mixed model with random intercept and random slope. Pooling lmer goes fine, except that it doesn't pool the random…

r lme4-nlme multiple-imputation

asked Oct 02 '14 at 10:49

Helgi Guðmundsson

votes

5 answers

Multiple imputation for missing values

I would like to use imputation for replacing missing values in my data set under certain constraints. For example, I'd like the imputed variable x1 to be greater or equal to the sum of my two other variables, say x2 and x3. I also want x3 to be…

r spss missing-data multiple-imputation

asked Dec 05 '13 at 09:33

rose

votes

2 answers

using neighbor information in imputing data or find off-data (in R)

I have dataset with assumption that nearest neighbors are best predictors. Just a perfect example of two-way gradient visualized- Suppose we have case where few values are missing, we can easily predict based on neighbors and trend.…

r prediction outliers data-imputation multiple-imputation

asked May 28 '14 at 02:55

rdorlearn

3,493
6
26
29

votes

2 answers

How can I pool bootstrapped p-values across multiply imputed data sets?

I am concerned with the problem that I would like to bootstrap the p-value for an estimate of $\theta$ from multiply imputed (MI) data, but that it is unclear to me how to combine the p-values across MI sets. For MI data sets, the standard approach…

confidence-interval variance p-value bootstrap multiple-imputation

asked Dec 02 '13 at 12:30

tomka

5,874
3
30
71

votes

1 answer

"the leading minor of order 1 is not positive definite" error using 2l.norm in mice

I am having a problem using the 2l.norm method of multilevel imputation in mice. Unfortunately I cannot post a reproducible example because of the size of my data - when I reduce the size, the problem vanishes. For a particular variable, mice…

r missing-data multiple-imputation mice

asked Apr 12 '13 at 11:59

Robert Long

53,316
10
84
148

votes

5 answers

How to perform imputation of values in very large number of data points?

I have a very large dataset and about 5% random values are missing. These variables are correlated with each other. The following example R dataset is just a toy example with dummy correlated data. set.seed(123) # matrix of X variable xmat <-…

r random-forest missing-data data-imputation multiple-imputation

asked May 25 '14 at 18:27

John

2,088
6
27
37

votes

2 answers

In a longitudinal study, should I impute the outcome Y, measured at time 2, for individuals who were lost to follow-up?

I have repeat measures at 2 times points in a sample of people. There are 18k people at time 1, and 13k at time 2 (5000 lost to follow-up). I want to regress an outcome Y measured at time 2 (and the outcome is not able to be measured at time 1) on…

panel-data multiple-imputation

asked Dec 20 '13 at 06:55

D L Dahly

3,663
1
24
51

votes

2 answers

Applying Rubin's rule for combining multiply imputed datasets

I am hoping to pool the results of a pretty basic set of analysis performed on a multiply imputed data (e.g. multiple regression, ANOVA). Multiple imputation and the analyses have been completed in SPSS but SPSS does not provide pooled results for a…

spss missing-data multiple-imputation pooling

asked Jul 07 '15 at 21:35

user81715

2 3

…

30 31 Next