Have data with "valid" and "invalid" classes, lots of predictors, over 15. Only 5% of data set is valid (success class 1), 95% is invalid class 0. The number of invalids is skewing my model, it can classify accurately the invalids but it's bad at classifying valids.
Oversampled with valids to get a logistic model that doesn't have too many false negatives, got too many false positives. Changed probability cutoff to 0.65. This lowered false negatives, got too many false negatives now.
Found that adjusting the probability cutoff to 0.65, is the pivot point for too many false negatives vs too many false positives. Does it make sense go with different probability cutoffs for the same model? Use model with prob cutoff 0.5 for accurately classifying 1's and use 0.65 for accurately classifying 0's. Does this make sense? Any other ideas to classify better? I tried classification using other types and same issue.
Trimmed many predictors to a few using p-values and best subsets.
One clarification point, i've trained portions of dataset then validated on full dataset to get accuracy metrics for false positive/negatives.