Testing MAR assumption knowing the missing data

Question

I am creating some artificial missing values in a dataset using the 2 well known mechanisms MAR and NMAR. I want to validate what I create, but I cant find any statistical test that given the observed, missing and then complete data can tell me whether the MAR holds.

My idea is that with MAR the distribution of the complete and observed data would stay pretty much the same so I tried with the KS test (2 samples) and the t-test between the complete and observed data... but it always reject the MAR hypothesis.

Any help would be appreciated Thanks

p.s. I know that MAR is not testable, but in this case i do know the values of the missing data!

I ll be more accurate: given a complete dataset i am generating MAR clones with a range of 10/20% of missing values. The procedure is the following: I choose 2 random columns Y, K and a random evaluation method for each value in the Y column (value < mean or value > threshold or some other function). In the positive evaluation case, with a certain probability p I remove the entry in the column K and row R. In this way the "missingness" of the values in the column K does not depend on K itself but on the observed column Y ( which is the definition of MAR). I now need to find a stastical procedure to test this "missing-value generator". In other words, given a sample dataset Z (and the original dataset if needed) i want to be able to say ( with a certain p-value of course) whether the artificial MAR holds or not.

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

If you generate a small subset of MAR from highly skewed data, it may not have the same distribution as the original dataset. You could bootstrap the original dataset to see how big of an issue that is. Alternatively, try to balance out the datasets by extracting more MAR data. In any case, evaluating the degree of departure between the two distibutions may be more informative than just looking at the p-values. The reason you could be seeing significant p-values can be due to large sample sizes (as discussed in more detail here). Finally, basic t-test does not seem to be very helpful here at all unless you are only interested in equality of means.

Testing MAR assumption knowing the missing data

1 Answers1