I'm working on a classification problem where I expect
$True\ Positive\ Rate =0.999$
$True\ Negative\ Rate = 0.001$
To model this data, I have created a training set with an equal proportion of true positives and true negatives. I am using this data in a logistic regression model, from which I receive probabilistic classifications. These classification probabilities, however, do not reflect the distribution of the unbiased data. Is there a way to correct this bias without creating an unbiased data set and refitting?
Thank you!
EDIT: After doing some research, I have come across this reference as a starting point: Sample Selection Bias as a Specification Error
EDIT2: The above paper and its associated wikipedia page provide a means of correcting sample selection bias by regressing upon a learned model of the sample selection bias. The exact implementation, however, assumes normality of the joint distribution of the error terms. I'm not sure if this assumption holds for logistic regression.
EDIT3: The assumption of normality for the error terms in logistic regression does not hold because there is in fact no error term in logistic regression. For explanation see Logistic Regression - Error Term and its Distribution.
Side note: I'm not sure what the etiquette here is regarding answering your own question, but I suppose I'll do that and mark it as accepted.