1

Most of methods of imputations requires either MAR or MCAR. How do we check the assumption on MAR or MCAR in general?

In how to check missing data is missing at random or not?, Turgeon said $H_0:MCAR$ vs $H_1:MAR$ is tested by logistic regression of missingness against covariates. There is no particular reason why one should assume $log(f(M=1|X)/f(M=0|X))$ must take the form of linear function of covariates $X$ where $M$ is missingness indicator function. In general $f(M|X)$'s form is not known.

Why in above situation, one can test by logistic regression?

user45765
  • 765
  • 3
  • 10
  • 1
    Aside: This is a **causal** question: you are asking about how to check assumptions about what causes missingness. – Alexis Feb 12 '22 at 18:29
  • @Alexis Yes. It can be phrased that way. And I think they should be equivalent. – user45765 Feb 12 '22 at 18:33

1 Answers1

2

You can't check the MAR assumption in general. More precisely, for any observable data distribution there is a complete data distribution and an MAR missingness mechanism that gives precisely that observable data distribution.

If you have variables $(X,Y,Z)$ and you are fitting a model for $(X,Y)$, some people would use MAR to mean that missingness was independent of the missing values conditional on the observed values of just $X$ and $Y$. That is testable in the sense that you can see if missingness is independent of $Z$.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • I think you cannot test MAR or MCAR in any case. As you will never have information on missing data, you cannot construct or estimate $f(M|X)$ where $X$ may include outcome variables of interest. I always thought this missingness assumption is made by experimental design. Otherwise, you need to guess it is MAR. Independence cannot be tested as you do not have missing data to estimate empirical joint distribution of $Z$ and missingness. Or probably, I have misunderstood your testability here. For conditional joint distribution case, I could not see whether one could estimate joint distribution. – user45765 Feb 13 '22 at 02:41
  • 1
    You can certainly demonstrate that data don't satisfy MCAR; that's a testable condition on the *observed* data distributions. You can't demonstrate data don't satisfy MAR, because it's a condition on the data you don't have, which was my point. – Thomas Lumley Feb 13 '22 at 04:13
  • Yes. You are correct on MCAR by marginalize joint distribution. – user45765 Feb 13 '22 at 04:26