I'm conducting analyzing a complex sample design survey of health institutions, which I've 70% of the overall planned sample. However, I've strata as high as 95% and as low as 47% response rate. To correct for unit non-response, I've applied a weight correction as discussed here. Nevertheless, I'd want to understand whether the resulting survey can be applied to the whole population without severe restrictions. Let's say, how much missing unit is too much when I want to generalize any finding to population? I would appreciate any approach on this as well as literature suggestions.
-
maybe this is a small contribution: http://stats.stackexchange.com/questions/226992/how-to-determine-which-of-the-important-missing-variables-to-ignore/227025#227025 – Aug 10 '16 at 07:33
2 Answers
The crucial issue is likely to be the missingness mechanism. If you or the reader of your report believe that the reason for missingness is related to the subject in which you are interested (so-called informative missingness) then even a small amount of missing is going to be a problem. If on the other hand it is missing completely at random then it is just an efficiency issue. There is a fairly full discussion here https://en.wikipedia.org/wiki/Missing_data If you have covariates which predict missingness including them in your final model is useful

- 16,541
- 22
- 30
- 57
There is no good answer to "how much missingness is too bad". The issue of nonresponse and the biases that it may cause has been the central concern for survey statisticians in the past 20 or so years. The paper Groves (2006) is the most cited paper in Public Opinion Quarterly, one of about five journals on survey methodology. The basic consensus is generally consistent with what @mdewey pointed out: if the fact of missingness is correlated with the outcome, that's bad; if it is not, then you are OK. Well-designed and well-executed surveys maintain relatively little bias with response rates as low as 10% (Pew 2012).
The survey literature has built a number of ways to assess nonresponse biases, although of course most of them are indirect. A preliminary exploratory analysis is the response rates by groups: if the response rates do differ, then there is a threat of nonresponse bias (that is, it is not a sure thing, but you are putting yourself at risk). A more sophisticated analysis would be to build a response propensity model, and correlate it with the outcomes in the sample. This is not as good as building such a model for population, but at least it is a step in a sort of desirable direction.
My own example of the framework and the steps is available here.
For nonresponse adjustments to the weights, I would rather have you refer to the actual guidance on the topic, rather than to the CV posts (although @steve-samuels is an expert on this topic).
-
(+1) Just for your information; the link I refer to in this answer (and its references) are also often cited: http://stats.stackexchange.com/questions/226992/how-to-determine-which-of-the-important-missing-variables-to-ignore/227025#227025 – Aug 10 '16 at 07:37