0

I want to predict the probabilities of sales opportunities using a binary classification algorithmn. However after using logistic regression my results do not seem realistic.

This could be due to the dropping of missing values (about 200 observations (rows) of 1500 observations are dropped). My questions is now which binary classifier could I use that works with missing values?

Another problem of my prediction is that not many features are available. Also most of the features are categorical.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
M_Polifke
  • 25
  • 7
  • For missing data look into imputation, for instance https://stats.stackexchange.com/questions/78632/multiple-imputation-for-missing-values and search this site! Really, to get an answer you need to give us some more context, how many variables? What does the variables measure, show us some plots or tables, ... – kjetil b halvorsen Jun 29 '18 at 09:29

1 Answers1

1

You started out on the wrong foot. This is not a classification problem but is a probability estimation problem - you need a continuous prediction that can give rise to a lift curve if used for selection of potential customers. See this. Consider ridge logistic regression (penalized logistic regression using an L2 norm). For missing predictors consider the use of multiple imputation. All this is detailed in my RMS course notes.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322