How to best to use Continuous value features with discreet values for logistic regression based binary classification problem

Question

This is related to Minimisation algorithm for a mix of discreet and continuous parameters?

I am trying out logistic regression to solve a binary classification problem. Though I am feature-scaling all my features. The result model coming about to be biased towards positive examples, this makes me feel I am not doing something right.

For example, one of 100 feature is annual_income which can range between 0 to million and other feature is state code (State code could be 1 to 52). What would be the best way to use both features.

Need expert advice: I have 1% of labeled data doesn't have annual_income, this could be either we don't have person's detail or person is not employed. Should I just throw away that 1% of data or still use it since there are 99 features are not null.

score 2 · Answer 1 · answered Nov 11 '14 at 01:30

2

You need to do a lot of background reading before proceeding. You have significant issues in understanding why you don't use scaling with logistic regression and why logistic regression is not a classification method but is a probability estimation procedure.

Maximum likelihood easily handles mixtures of categorical and continuous predictors.

answered Nov 11 '14 at 01:30

Frank Harrell

74,029
5
148
322

Thanks for the information. Just to clarify my question, I am not yet mixing the two features, right now I am just using all continuous value features, but want to use scalar features and wanted to know best way to do it. I plan to try Maximum Likelyhood and SVM as well and test out all that with learning curve and Cross validation. – Watt Nov 12 '14 at 08:47

score 1 · Answer 2 · edited Apr 13 '17 at 12:44

A single state code cannot be a feature. If state code needs to be used, it should be broken to 52 features with value 0.0 or 1.0; however, the samples are much sparser and the resulting model would possibly perform worse. You can try and test it. Or you can break it to several features, each of which denote state codes of state in a region.

Annual income is not a good feature, or this feature is not so numerical. You can scale it with log and then between 0 to 1 so that the feature is a good indicator of richness. Or if you sense that high income is a good predictor, you can transform annual_income to a binary feature; the value is 1.0 if the instance is quite rich with annual income over 0.2M and 0.0 otherwise.

Besides, this question shows that mixing continuous and categorical features is OK, and no other answers try to refute the answer.

In addition, regarding the 1% of data without annual income, you can either simply "predict" the feature values with certain other effective features using linear regression or use a more sophisticated method like expectation maximization to predict missing feature values iteratively.

Thanks Tom! I like the idea of using log of income. I will test it out. Making state 52 features also occurred to me, I wasn't sure if I need to do that for logistic regression. — Watt, Nov 12 '14 at 08:51
I'm not sure of the statistical properties of that approach. If a categorical predictor such as state has too many levels to be supported by the sample size, penalization is a better solution. A quadratic penalty is akin to treating state as a random effect. For income, I would model this as a regression spline in the square root of income. — Frank Harrell, Nov 12 '14 at 12:13
Hi Frank, I browsed a few Internet tutorials on penalization and cannot understand it. Can you briefly illustrate how to use quadratic penalty in this case? — Tom, Nov 12 '14 at 13:26

How to best to use Continuous value features with discreet values for logistic regression based binary classification problem

2 Answers2