2

I am using 3 features (x1, x2, x3) for binary classification. All my feature values are in 0 to 1 range (unit range).

I obtained how important each feature was in classification as follows (i.e. feature importance)

x1 --> 0.1
x2 --> 0.5
x3 --> 0.7

It is clear that feature 3 (x3) contributes the most, x2 the second and x1 the least in classification.

I also performed correlation analysis to check if my features are positively or negatively correlated with the target (y) as follows.

x1 --> positively correlated
x2 --> positively correlated
x3 --> negatively correlated

I am wondering if it is possible to convert my classification features into a ranking function using feature importance and correlation.

For instance, my suggestion looks as follows.

ranking_score = 0.1*x1 + 0.5*x2 + 0.7*(1/x3)

The reason for using (1/x3) in the above equation is because it is negatively correlated with the target (y). Please let me know if my ranking_score equation is statistically correct? If not, please let me know your suggestions.

EDIT: Why ranking is important to me?

My features are related to employee details (x1, x2, x3). At first I used these 3 features to classify efficient and 'inefficient' employees. Now, I want to rank the efficient employees based on these 3 features. The above equation I proposed is to facilitate this task.

I am happy to provide more details if needed.

EmJ
  • 592
  • 3
  • 15
  • 1
    What is the purpose of creating this ranking score? What are you planning to rank and what is the ranking meant to represent/measure? – AlexK May 13 '19 at 05:22
  • @AlexK Thank you for the comment. I edited my question based on your questions. Please let me know your thoughts on my equation. I look forward to hearing from you. Thank you very much :) – EmJ May 13 '19 at 05:32

1 Answers1

2

There is a lot of other questions/issues here: what model did you estimate? did your model accurately classify everyone? did you perform feature scaling/standardization before estimating the model? what kind of feature importance did you estimate (or just what package/commands did you use), as there is more than one way to get feature importance? Also, you are treating feature importances as marginal effects (like coefficients/betas in a linear regression) and that's not what they are. And you are assuming a linear/additive function of the effect of these features on efficiency, and classification algorithms don't assume/model that kind of relationship. And taking inverse values of a feature will change the relationship with the dependent variable entirely. If the goal is just to change the sign of correlation, values should just be multiplied by -1. So this overall is just not a sensible approach in my opinion.

Instead, I would recommend that you simply compute probabilities of being classified as efficient for everyone (using that same algorithm that you used to perform classification) and rank individuals by their estimated probability.

AlexK
  • 1,007
  • 4
  • 11
  • 1
    Thanks a lot for the great answer. Did you mean something like `predict_proba` https://stats.stackexchange.com/questions/329857/what-is-the-difference-between-decision-function-predict-proba-and-predict-fun to measure the probability? Looking forward to hearing from you. Thank you very much :) – EmJ May 13 '19 at 06:26
  • 1
    @Emi, right, that's what I had in mind. – AlexK May 13 '19 at 06:30
  • Thanks a lot for your interesting suggestion. One last question. if i get same probability for say like 10 employees. Is there a way to further give them a rank without giving all of them the same rank? I look forward to hearing from you. Thank you very much once again :) – EmJ May 13 '19 at 06:42
  • 1
    That will mean that they have exact same values for all three features and the response variable. One way or another, given only this data and model, I don't think you'll be able to do anything more about that. So you would need to think about re-specifying your model (adding other features, transforming some of the features if it makes sense, or running a different type of classification model). – AlexK May 13 '19 at 06:56
  • Hi, I am in the process of implementing the idea you suggested and encountered the follwing question. Currently, I am using 75% of my data for 10-fold cross validation with grid search parameter tuning and 25% of data for testing. In that case,`predict_proba` can only be applied for test data. Is there a better way of applying `predict proba` to the whole dataset. Please let me know if my description is not clear. I look forward to hearing from you. Thank you very much :) – EmJ May 28 '19 at 06:57
  • 1
    That method can be applied to any dataset you specify and it should not matter whether you apply it separately to train and test data or all at once on a combined dataset. This is more of a programming question. I recommend that you post it on https://stackoverflow.com if you need additional guidance. – AlexK May 28 '19 at 07:31
  • Thanks a lot for your comment. Sure, I will. Thank you very much :) – EmJ May 28 '19 at 09:39