Logistic regression on all subsets does not work well

Asked Oct 26 '20 at 11:03

Active Oct 27 '20 at 13:13

Viewed 47 times

I have approximately 3k data rows. I wanted to get a model which can says whether a row should be labeled A or B.

I've used logistic regression and trained model for all subsets of features that I have, and the results are very poor. The best what I could achieve was 56% of accuracy with following confusion matrix:

TN: 446 FP: 6

FN: 349 TP: 12

Maybe it means that I can't predict anything based on features that I have? Should I try something different?

edited Oct 27 '20 at 13:13

kjetil b halvorsen

63,378
26
142
467

asked Oct 26 '20 at 11:03

Aleksander Nuszel

1

How many events and non-events do you have? Have you done a global statistical test within the logistic model to see if there is any evidence for _any_ predictor being associated with Y? – Frank Harrell Oct 26 '20 at 11:36
See also: https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models and https://stats.stackexchange.com/questions/222179/how-to-know-that-your-machine-learning-problem-is-hopeless – Sycorax Oct 26 '20 at 13:22
@FrankHarrell I haven't done any tests. Can you provide me source how can i do it? My data consists approx. half of A and B – Aleksander Nuszel Oct 26 '20 at 17:29
2

It is not appropriate to be using logistic regression without studying the subject first. Many materials are available including hbiostat.org/rms and hbiostat.org/bbr – Frank Harrell Oct 27 '20 at 11:02

Logistic regression on all subsets does not work well

0 Answers0