Following the question here, the target population is made up of 600 units which are distributed as follows:
- while units (17%)
- black units (42%)
- green units (8%)
- yellow units (33%)
A representative sample would be one which broadly reflects this percentage distribution.
However, in this case, one would end up with only a small number of green units (because of their lowest population distribution) thus it is not reliable to analyse the green units on this own. For example, in a sample of 100 units, about 8 will be green units
I want to analyse each colour units on their own (as well as overall) and I understand one way to overcome the above issue is to oversample the green units.
Question: What is the risk of oversampling the green units but keeping the number of other units fixed?
For example, in a sample of 100, I can have approximately 25 units of each colour.
This is out of practical consideration such as the cost of data collection.