How to programmatically differentiate between MCAR (Missing Completely at Random), MAR(Missing at Random), and MNAR(Missing Not at Random) in python

Question

I found the following code in R. Im not sure how much does it serve this purpose. But I want to implement this in python. How does this mostly convert to?? I also want to differentiate between all these categories, MCAR,MAR,MNAR

link:

statistical approach to determine if data are missing at random

following code is one of the answers for this question in above link

#Load dataset
data(sleep, package = "VIM")

x <- as.data.frame(abs(is.na(sleep)))

#Elements of x are 1 if a value in the sleep data is missing and 0 if non-missing.
head(sleep)
head(x)

#Extracting variables that have some missing values.
y <- x[which(sapply(x, sd) > 0)]
cor(y)

#We see that variables Dream and NonD tend to be missing together. To a lesser extent, this is also true with Sleep and NonD, as well as Sleep and Dream.

#Now, looking at the relationship between the presence of missing values in each variable and the observed values in other variables:
cor(sleep, y, use="pairwise.complete.obs")

#NonD is more likely to be missing as Exp, BodyWgt, and Gest increases, suggesting that the missingness for NonD is likely MAR rather than MCAR.

Though I wrote the following code:

import numpy as np

def chkIfDataMissingAtRandom(df):
    df_binary = np.where(df.isnull(), 1, 0)
    y = df_binary[df_binary.std(axis=1) > 0]

Not really sure how to completely extend it further.

Im not so keen to implement the above method only. Im open to new and more robust and better ideas.

I also found another approach in the following link:

how-to-check-missing-data-is-missing-at-random-or-not

One of the answers (Not really sure how much feasible is this):

"Here is one way to test the missingness-at-random assumption.

Suppose the question on participant's income has some missing entries. Run a logistic regression with income as your response and everything else as predictors. Your response would be 1 if it's missing, 0 otherwise. The p-value of the predictors should give you an idea whether this MAR assumption is any good."

How to programmatically differentiate between MCAR (Missing Completely at Random), MAR(Missing at Random), and MNAR(Missing Not at Random) in python

0 Answers0