I would like to build an ensemble classifier (possibly boosting) on a huge training dataset (>> 1e7 examples) where the proportion of positive examples is around 5%. And what I am interested in are recall and precision of positive class.
If I train on the whole dataset, does the small proportion of positive examples affect the performance of the classifier? I'm not sure because the absolute number of positive examples is big. Do I need to randomly select number of negative examples so the number of positive examples is around the same as the negative examples?