what to do with 0.5 class probabilities ?

Question

I am currently training a random forest regressor (scikit learn) on the Titanic dataset.

My question is related to this issue (https://stackoverflow.com/questions/19984957/scikit-predict-default-threshold) on stack overflow.

I noted that I didn't have the same value as in scikit for measures like Precision, Recall, F1-score ... After investigating I noticed that the reason was I considered 0.5 probabilities individuals to be in class 1 while scikit classes them as 0.

So here are my questions :

is it better to class 0.5 probabilities individuals in 0 or 1 class ? On titanic for example it can change significantly the value of such measures.
would it be legit not to use these ? I do not think so because it bias your results and may tend to improve them.
what about classification with more than two classes ? If I have 1/3,1/3,1/3 as probabilities for one individual what should I do ?
is there any performance measure emancipated from this problem ?
is scikit-learn choosing this 0.5 -> 0 class every time or can it be random / depends on the model selected ?

@FrankHarrell makes arguments in this thread that bear directly on this question, namely whether such cutoffs are desirable. http://stats.stackexchange.com/questions/65382/adding-weights-for-highly-skewed-data-sets-in-logistic-regression#comment165335_65382 — Sycorax, Feb 04 '14 at 14:52
yes, my question is not that far from this link. However, I'm not limiting the context to highly unbalanced datasets. Imagine a balanced dataset where there are a lot of 0.5 probabilities for classification (in {0,1}). I need to know what to do with those. It is not that much about the tradeoff but rather how to derive a performance measure on the standard maximum likelihood prediction. — Scratch, Feb 04 '14 at 15:10
This may sound dumb but you can altogether avoid this by generating an ensemble with an odd number of trees e.g. ntree=1001 — JEquihua, Feb 05 '14 at 15:06
This is not dumb but I'm not sure this would be correct as not all the individuals are in each tree because of the bootstrap part... — Scratch, Feb 05 '14 at 16:10

what to do with 0.5 class probabilities ?

0 Answers0