5

Other than a calibration plot, is there a way to decide how good one models' predictive probabilities as compared to another model.

I'm not interested in error rates as I find them ineffective for the level of precision I'm looking for.

The only quantity of interest is the predictive probability distribution, as I am pricing contracts using them.

EDIT:

I have no faith in scoring rules based on the experience below with several different classifiers.

I've simulated data from a known model. Trained the known model and a worse model using the simulated data, and the brier and log rules don't agree that the known model is superior. The class probabilities are materially different.

Yoda
  • 379
  • 1
  • 8
  • possible duplicate of [Metric for probability based classification](http://stats.stackexchange.com/questions/15745/metric-for-probability-based-classification) – sds Apr 24 '15 at 20:24

1 Answers1

3

You can use proper scoring rules. http://en.wikipedia.org/wiki/Scoring_rule

Momo
  • 8,839
  • 3
  • 46
  • 59
  • Tried before and not worked. I've changed the question above to highlight my experience. – Yoda Aug 23 '12 at 16:24
  • That's interesting. Would you mind sharing more of that experience? I think that if the scoring rule says that the "worse model" predicts better, it shouldn't be called the worse model anymore. – Momo Aug 23 '12 at 16:36
  • 1
    Right, except I know how the data was generated. It was a simulation of the inputs using normal distributions, a mapping to the class probabilities using the binomial distribution with known linear weights for the linear predictor. I then simulated each class using the class probabilities. The models were trained, one with the correct parametric form, another with one of the input variables missing. The scoring rule gave preference to the model with the missing variable. The calibration plot showed that the model trained with all inputs was more accurately estimating class probs. I used LOOCV. – Yoda Aug 23 '12 at 17:06
  • I think it has something to do with the uncertainty in the outcome class due to the probability distribution being confused with the uncertainty in the specification of the probability distribution in the first place. I've yet to find a way to disentangle the two. – Yoda Aug 23 '12 at 17:24
  • To obtain your predictive distribution, do you: 1) Compute an estimate $\hat{\theta}$ and just "plug it" getting $f(x_{n+1}\mid\hat{\theta})$; 2) Use a full Bayesian posterior $f(x_{n+1}\mid x_1,\dots,x_n)=\int f(x_{n+1}\mid\theta)\pi(\theta\mid x_1,\dots,x_n)\,d\theta$? – Zen Aug 26 '12 at 03:36
  • I ask that because in the first case you are discarding any information about the uncertainty of your estimate $\hat{\theta}$, and maybe that is screwing your predictive criterion. That's my guess. – Zen Aug 26 '12 at 03:39