I need to collect a two group sample for a comparison analysis (perhaps using logistic regression).
The population that I need to extract a sample from is all firms from country A with activities in country B. The firms are classified into two categories: having a subsidiary in country B (S), or not having a subsidiary in country B (NS). I expect the share of S firms to be small relative to NS firms (but I have no way of knowing for sure).
I already hold the entire population of S firms (because this data was available to me). However data on NS firms is not readily available and I have to collect that, and I will probably not get access to identify and collect all NB firms.
So my situation is I have the entire population of S firms, and need to collect enough NS firms for subsequent analysis to be significant. Most likely my final sample will consist of all S firms and some share of the population of NS firms. Without much experience in doing these kinds of studies, I can't help to think that there is some kind is bias/reliability issue when sampling this way (one group: entire group population, other group: some part of group population). I have learned that if it so happens that the population of NS firms is indeed much larger than S firms (again there is no way to know without data for the entire population of firms), and I e.g. end up with similar-sized samples of each group, there will be a case of oversampling the minority group. However I cannot find any remarks anywhere that consider this a problem for a comparison study, as a correct sample representation of the entire population is less important in this manner.
Is my concern justified? Or is it fine to do it that way for e.g. logistic regression? If not, how can I get around the issue?