I had a dataset with binary data. I created a logistic regression model with continuous data on the x-axis and binary data that has values 0 and 1 on the y-axis. Then I plotted my model, a line that shows the possibility of belonging to the '1' value on the y-axis. The probability line seems fine, but how can I test the accuracy of this model?
Asked
Active
Viewed 7 times
0
-
The tool of choice is a proper scoring rule. More information in [the tag wiki](https://stats.stackexchange.com/tags/scoring-rules/info). – Stephan Kolassa Jan 28 '21 at 10:30
-
@StephanKolassa yes it does, thank you. But I have another question. What is the perfect model for estimating probability of two events? – atilla Jan 28 '21 at 10:33
-
I don't think there is a "perfect model". If there were one, then hordes of statisticians and data scientists, whose job is to find a *better* model than the current one, would be out of a job. Believe me, the people who pay my salary would love to know there is *one perfect model*, and they could stop spending so much money on my! Also see George Box's quote: "All models are wrong, but some are useful." – Stephan Kolassa Jan 28 '21 at 10:35
-
In that case, can I use logistic regression for probability predicting? – atilla Jan 28 '21 at 10:36
-
Sure! Or a CART, or a Random Forest, or any flavor of NN. But starting with a very simple logistic regression should definitely be the first step! (The even earlier *zeroth* step should be to just take the average incidence of your target class in the training sample, which is equivalent to a logistic regression without any predictors. It may well be that your models can't even improve on that. Assess everything on a holdout sample.) – Stephan Kolassa Jan 28 '21 at 10:38
-
Thank you so much! I'm a high schooler, so I'm new to this and am a little overwhelmed by tons of different models. – atilla Jan 28 '21 at 10:40
-
No problem whatsoever! I find it great that you deal with logistic regression in high school, so keep this up! (And better to understand a logistic regression in any depth than just plugging black boxes together without understanding, that's [cargo cult](https://en.wikipedia.org/wiki/Cargo_cult) data science.) – Stephan Kolassa Jan 28 '21 at 10:43