0

I would like to do some imputation in a situation like this:

Knowing that a value is missing is highly informative itself: if a certain variable has a missing value, it must be between zero and some positive constant c. This c is smaller (usually quite a lot smaller) than any of the observed values. Constant c varies between the variables, though. Because of this informative nature of missing values, I consider that uniformly distributed random numbers would suffice.

The problem is how to do this in the framework of mice package because 'runif imputation' is not built in mice. Even I (with my humble coding skills) could easily do the actual imputation part without the package but I would like to exploit the very convenient framework in the analyses.

I found a topic a bit similar (Multiple imputation for missing values) where some custom imputation methods are implemented but they actually refer to a method already implemented in mice (PMM), so it appears not to be very helpful.

I also tried to study the post-processing feature (https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html) but the "squeeze" example is not quite the thing I am looking for and I do not know how to modify it to my case.

So, what could be the easiest way to do this 'runif imputation' in mice package?

A toy model of the data and the corresponding constants c are below:

v1<-c(NA,NA,196684.6,8266.4,12403.1,NA,315621.8,163686.0,267788.9,2818.6,101087.4,NA,49193.2,178748.6,NA,40129.9,137476.2,253865.2,NA,13409.9)
v2<-c(4786.9,NA,4181.1,10038.5,13842.5,21692.1,15093.4,NA,5662.9,8000.2,8426.2,5160.7,NA,NA,16904.2,10409.6,7379.6,30973.0,NA,NA)
v3<-c(NA,2512989,407434.2,436502.7,931959.3,528485.7,1319345.0,826987.4,258134.1,413298.0,2976947.4,484316.1,NA,205461.1,1808292.0,NA,374079.4,2413572.0,131438.5,311361.4)

dat<-data.frame(v1,v2,v3)
c<-c(4000,20,25000)
Jacko
  • 1
  • I am voting to leave this open because the underlying issue is statistical. Some might think it is a duplicate, but I don't think it is. – Peter Flom Jul 18 '19 at 11:19

1 Answers1

0

Questions about R are off topic here, but the basic thing is that MICE is not designed for this sort of situation. What you have is a special type of missingness - your data are censored. That is, you know the data are between 0 and c, but you don't know exactly where.

MICE (and other general programs for multiple imputation) are designed for the general case of missing data - where the missing value could be almost anything, even a value larger than any in the data. They are also based on using existing information to do the imputation.

So, you may be right to want uniform imputation for the data (it might be better to use some other distribution, depending on the case, but it probably won't make much difference) but you are looking in the wrong place for a solution.

If you Google "censored data interval r minimum detectable" you will find some articles that I think will help.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276