4

Is it a good practice to find the absolute value of the sum of weights of the features on the first hidden layer in neural network to find the importance of features using neural network ?

Harshit Mehta
  • 1,133
  • 12
  • 15

1 Answers1

2

Say that all of our features have value 1. Give features one and two weights 3 and 1, respectively--they lead to node A where they activate with 1*3+1*1=4. We also have features three and four with weights 2 each--they lead to node B and activate with 1*2 + 1*2=4. In the next layer, node A has weight 0.4 and node B has weight 0.6. Is feature one more important than both features three and four?

What if there are 7 more layers?

Often, neural networks are used in a setting where features interact so much that the concept of importance is not really clear (e.g., pixel data). There is however a lot of work on interpreting neural networks.

As far as feature importance; if the features truly have distinct importances, it might be worth using a different classifier to see it (e.g., LASSO).

With a neural network, possibly one could shuffle each feature and see what happens to predictive performance? This is one way of doing it for random forests. I have seen some recent papers where the authors, I think, kind of masked features and checked the effect. Another option suggested here is to calculate the gradient with respect to the inputs.

sjw
  • 5,091
  • 1
  • 21
  • 45
  • but the node A and node B are combination of original features rather than the actual features, so why the weights of nodes will decide the actual features importance? – Harshit Mehta Jul 12 '17 at 16:24
  • A feature is more important if it has a substantial effect on the output of the network (and therefore relationship with the target). Those combinations are closer to the output than the original features. – sjw Jul 12 '17 at 16:28
  • Okay. Thanks. Is there other way of getting features importance from the neural network? how about getting the cumulative weights of each feature across all the layers? – Harshit Mehta Jul 12 '17 at 16:32
  • Perhaps see `Garson, G. David. "Interpreting neural-network connection weights." AI Expert 6.4 (1991): 46-51.` In my opinion, cumulative weights would be difficult because features are transformed in each layer, so not sure how one could "track" a feature from input to output. – sjw Jul 12 '17 at 16:42