Today we started arguing at work, and couldn't come to conclusion. Let's say that we have a population of 1000 observations about various people. 50 of these people went bankrupt (1 - bankrupt, 0 - did not went bankrupt). Can we take sample of 100 people (50 bankrupt, 50 not bankrupt) and use them to make a model of bankruptcy (using linear regression or MDA)? Or must we take a random sample of 100 people, which should include around 5 people that went bankrupt?
Do we have to keep the same proportions as in population in modelling sample, to use the model on population ?
What problems would occur with 50-50 sample ?
Thanks!