self-selection bias due to nonresponse?

Question

I have administrative data from the whole population of new doctorates, in a given year, from my region. We have also survey data from a sample of this same population (where the whole population was contacted to participate in the survey, but only around 65% of doctorates participated: this is the sample I should work on). There will be self-selection bias if the non-response is not random.

I want to investigate possible wage gaps between different groups in my region (the dependent variable is the log of the ratio between two average wages).

Any inputs on how to tackle this issue? Literature on this issue?

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

2

A few points:

1) The log of a ratio is simply the difference of the logs. Is this really what you want?

2) As to your main point; although it is not completely clear, the two methods that come to mind here are multiple imputation and propensity scores. There is a huge amount of literature on both of these. You could start by looking up both those terms right here on CrossValidated. That should get you to places where you can access the wider literature.

Here are 28 posts about propensity scores.

Here are 53 posts about multiple imputation.

and here is one post about using both (with links to several articles)

edited Apr 13 '17 at 12:44

Community

1

answered Nov 02 '13 at 18:22

Peter Flom

94,055
35
143
276

Thank you very much for the answer! Yes, 1) is my research question. Thank you for 2) too, now I take a look into the suggested the literature. – Fuca26 Nov 02 '13 at 23:40

self-selection bias due to nonresponse?

1 Answers1