I have administrative data from the whole population of new doctorates, in a given year, from my region. We have also survey data from a sample of this same population (where the whole population was contacted to participate in the survey, but only around 65% of doctorates participated: this is the sample I should work on). There will be self-selection bias if the non-response is not random.
I want to investigate possible wage gaps between different groups in my region (the dependent variable is the log of the ratio between two average wages).
Any inputs on how to tackle this issue? Literature on this issue?