0

I would like to create a MCAR database in R from an existing complete one. I would like to have only some variables with NA observations, here's the code I used:

data <- master
set.seed(685) 
prop.m = .15  
mcar   = runif(length(data[,1]), min=0, max=1)
diabetes.mcar = ifelse(mcar<prop.m, NA, data$diabetes)
hypertension.mcar  = ifelse(mcar<prop.m, NA, data$hypertension)
antic_therapy.mcar = ifelse(mcar<prop.m, NA, data$antic_therapy)
years.mcar = ifelse(mcar<prop.m, NA, data$years)
data_mcar <- cbind(subset(data), diabetes.mcar, hypertension.mcar, 
                   antic_therapy.mcar, years.mcar)

Here's the problem: setting the seed I obtain the NA missing values for every variable just for the same observations:

> diabetes.mcar
 [1]  0  0  0  0  0  0  0  0  0  0  0 NA NA  0  1  1  0  0  0  0  0  0 NA  0  0 NA  0  0  0  0
[31]  0  0  0 NA  0 NA  1  0  0  0  1

> hypertension.mcar
 [1]  1  1  0  1  0  1  1  0  0  1  0 NA NA  0  1  1  1  0  1  1  0  1 NA  0  0 NA  0  0  1  0
[31]  1  0  1 NA  1 NA  1  1  1  0  1

> antic_therapy.mcar
 [1]  0  1  0  1  0  0  0  0  1  0  0 NA NA  0  0  0  1  0  0  0  0  1 NA  0  1 NA  0  0  0  0
[31]  0  0  0 NA  0 NA  0  0  1  0  1

> years.mcar
 [1] 69 77 70 75 68 73 68 66 71 51 75 NA NA 74 71 71 71 70 55 80 74 73 NA 78 73 NA 70 69 74 76
[31] 70 78 72 NA 77 NA 78 72 75 67 79

And this is not MCAR at all! How can I fix it? I've tried also to define a different seed for every variable but it doesn't work.

I've also tried in this way:

 mymatrix <- as.matrix(data)
 mcar   <- MCAR(db.complete = mymatrix, perc.miss = 0.15, setseed = 11)

but I obtain a s4 object and I don't know how to transform it in a data frame or export it as a csv.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
ArTu
  • 143
  • 6

1 Answers1

1

A couple of things, first what you have done is in fact MCAR. You have to understand the difference between total non-response and item non-response. You can have MCAR total non-response just as you do here.

If you want to simulate MCAR item non-response in a similar manner to how you did it, you will need to create a matrix of random variables so each record will have four random variables with which you could simulate the non-response.

I think something like this should work:

mcar <- matrix(nrow=length(data[,1]), ncol=4)
for(i in 1:length(data[,1])){
  mcar[i,] <- runif(4)
} 
diabetes.mcar = ifelse(mcar[,1], prop.m, NA, diabetes$data)
# etc.
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
astel
  • 1,388
  • 5
  • 17