5

The data include 3 equally sized subsets A, B and C, belonging to two classes:

  • A belongs to class 1.
  • B and C belong to class 2.

The prior probabilities of an observation coming from class 1 and class 2 are thus 0.33 and 0.67.

Next, a logistic regression model is fitted on all 3 subsets.
The predicted value of this model is the probability of an observation belonging to class 2 given his predictors values.

In reality I know for sure that I will never have observations belonging to subset C. So the observations will allways originate from either subset A or B and since both subsets are equally sized, I can assume that the prior probability of a new observation to be from class 1 or class 2 will changes to 0.5.

My questions are:

  1. Given the knowledge that all observations are from either A or B but not C, can you still interpret the predicted values as the probability of being in class 2 with the logistic regression model fitted on all 3 subsets?
  2. Are these probabilities biased because of the changed prior probability of being in class 1 and 2?
  3. If so, how to correct for this?
statastic
  • 261
  • 1
  • 10
  • As long as you specify the outcome variable the same, where 1=belong to class 2, and 0=belong to class 1, for both models, I believe the interpretation will be the same. – robin.datadrivers Feb 22 '15 at 23:17
  • Linked: http://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression/6072#6072 – Zhubarb Feb 23 '15 at 09:29
  • @Zhubarb Did I understand correctly from the link above, that the probabilities are indeed biased and a possible solution would be to perform weighted logistic regression? – statastic Feb 23 '15 at 10:22
  • It's not clear to me -- how did your priors arise? Please explain the second one in detail. – Glen_b Mar 05 '15 at 02:52
  • @Glen_b I hope it's more clear now :) – statastic Mar 05 '15 at 11:28

0 Answers0