Adjusting n after using pairwise deletion

Question

I am using pairwise deletion to compute the correlation matrix of a data set. I think this approach is appropriate because:

I have well under 10% missing values (~2%)
I have only around 50% complete cases (so casewise deletion disregards too much data)
Missing values are distributed evenly across cases and evenly across variables. (I have had difficulty running a proper test for MCAR as I have too many variables)

I am using the correlation matrix to perform a PCA and while I know there are no massive issues with the results, I am concerned that running significant tests based on the original n is not correct. Also I feel like I should be reporting some sort of adjusted 'post-deletion' n.

Is there any way to measure how much "information" (for want of a better term) I lose by using pairwise deletion compared to if I had a complete data set? In my case it does not really effect my result but I would like to know for in the future if I have data that has maybe 9-10% missing values and is MCAR. Should I be looking at some kind of imputation based method? Is there industry standards or rules of thumb?

Happy to hear opinions or be referred to papers/textbooks on this topic.

Why can't you impute the missing values perhaps by multiple imputation? — Michael R. Chernick, Jun 22 '17 at 00:40
I can, although in this case it is not necessary. I was interested in knowing if there are any methods for adjusting n/measuring how the lost data affects my calculations. I guess this is not the done thing so I might be barking up the wrong tree. — bmrn, Jun 27 '17 at 02:28

Adjusting n after using pairwise deletion

0 Answers0