1

I try to estimate the probability that a tennis player will win based on several predictors (such as skill, form, surface, weather etc.).

Can I use every classification method to estimate a probability such as these methods:

  1. Logistic Regression
  2. Linear Discriminant Analysis
  3. SVM
  4. Neural Networks
  5. KNN
  6. Bagged Trees
  7. Random Forest

Or are some methods more suitable to identify a specic class and less suitable for estimation of a precise probability?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user2165379
  • 229
  • 2
  • 9
  • 2
    Many classifiers give you uncalibrated probability estimates http://scikit-learn.org/stable/modules/calibration.html – Tim Dec 26 '18 at 15:19
  • 1
    This question. Right of the bat one can comment that SVMs do not return probabilities natively and have to rely to techniques like Platt scaling and isotonic regression to produce any probability estimates. Similarly a kNN does not really output probabilities. Yes, we could count the fractions of labels but that is really an approximation that is highly dependant on the the choice of $k$. – usεr11852 Dec 26 '18 at 15:20
  • @Tim That is an interesting blog. I think calibrating the outcomes is a bridge too far currently for me. I conclude that Logistic regression is the model providing the best probabilities. Do you know any other classification methods which provide accurate probabilities in general? – user2165379 Dec 26 '18 at 15:40
  • @usεr11852 Thanks. Do you know which classification methods would generate the most accurate probabilities without the use of additional techniques? – user2165379 Dec 27 '18 at 15:54
  • Given we would not use an additional calibration steps, Logistic regression, where we would probably use splines for continuous predictors, is our best bet. That said, producing a calibration plot should be not horribly hard. You could try an LR and then say a RF and then compare there calibration plots. – usεr11852 Dec 27 '18 at 17:04
  • 2
    **Calssification is not probability estimation**. Have a look at https://stats.stackexchange.com/questions/127042/why-isnt-logistic-regression-called-logistic-classification – kjetil b halvorsen Dec 27 '18 at 18:12

0 Answers0