This is related to Minimisation algorithm for a mix of discreet and continuous parameters?
I am trying out logistic regression to solve a binary classification problem. Though I am feature-scaling all my features. The result model coming about to be biased towards positive examples, this makes me feel I am not doing something right.
For example, one of 100 feature is annual_income which can range between 0 to million and other feature is state code (State code could be 1 to 52). What would be the best way to use both features.
Need expert advice: I have 1% of labeled data doesn't have annual_income, this could be either we don't have person's detail or person is not employed. Should I just throw away that 1% of data or still use it since there are 99 features are not null.