2

I need to predict (estimate) probabilities of (rare) events when the training data only contains the yes/no indicator.

I.e., my target (dependent) variable is binary (logical).

What I need is not just to predict yes/no, but estimate the probabilities of yes/no for each observation.

If I use logistic regression, then the model output is, indeed, an estimate of the probability. What if I am using a different model, e.g., vw? (because, e.g., it is faster and outperforms logistic regression as a binary classifier).

So, I have a model which produces a score for each observation and I want to convert the score to probability.

It is natural to use total variation distance to evaluate the probability prediction, which motivated my previous question. The accepted answer there suggests Liblinear with L1 loss, but that produces a binary classifier, not a probability estimator.

So, how do I calibrate model scores so that they actually estimate the event probabilities?

I now train a single-independent-variable logistic regression to map the scores to probabilities. Can I do better?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
sds
  • 2,016
  • 1
  • 22
  • 31
  • If you check among my questions you can find how to extract a probability from a decision tree classifier. It is reported in the book the elements of statistical learning. Sorry but I'm on a tablet can't give you the link – Donbeo May 23 '14 at 21:36
  • I think you're referring to http://stats.stackexchange.com/questions/93202/odds-ratio-from-decision-tree-and-random-forest – ThatGuy May 24 '14 at 06:57
  • yes is this one. Let me know if this can help – Donbeo May 24 '14 at 10:00

0 Answers0