0

I have approximately 3k data rows. I wanted to get a model which can says whether a row should be labeled A or B.

I've used logistic regression and trained model for all subsets of features that I have, and the results are very poor. The best what I could achieve was 56% of accuracy with following confusion matrix:

TN: 446 FP: 6

FN: 349 TP: 12

Maybe it means that I can't predict anything based on features that I have? Should I try something different?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    How many events and non-events do you have? Have you done a global statistical test within the logistic model to see if there is any evidence for _any_ predictor being associated with Y? – Frank Harrell Oct 26 '20 at 11:36
  • See also: https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models and https://stats.stackexchange.com/questions/222179/how-to-know-that-your-machine-learning-problem-is-hopeless – Sycorax Oct 26 '20 at 13:22
  • @FrankHarrell I haven't done any tests. Can you provide me source how can i do it? My data consists approx. half of A and B – Aleksander Nuszel Oct 26 '20 at 17:29
  • 2
    It is not appropriate to be using logistic regression without studying the subject first. Many materials are available including hbiostat.org/rms and hbiostat.org/bbr – Frank Harrell Oct 27 '20 at 11:02

0 Answers0