9

I'm trying to solve a binary classification problem by using an artificial neural network implemented in Torch. My neural network has 82 input features (=neurons).

After implementing a plain version that gives to the all the 82 input neurons the same importance, I need to design a new version of my algorithm in which a user can highlight / give more importance to one single feature among all the 82.

How could I do this in statistical tems? How could I edit my algorithm to give more importance to a single feature?

DavideChicco.it
  • 682
  • 1
  • 10
  • 24
  • 1
    A question that comes up right away is, how much "importance" do you give it, and how do you know that's not too much or too little. The thing about NN modeling is it's supposed to learn it from the data and you don't have to worry about deciding a weight for a particular feature. – horaceT Aug 31 '16 at 20:27
  • 1
    Can you expand on *why* you want to highlight an input feature? Without context I am not sure any answer you get will be useful. (e.g. I can imagine doing something like purposely adding noise to all the other features, to make them unreliable ... would this be useful though?) – GeoMatt22 Sep 04 '16 at 03:48
  • Sure. In my bioinformatics problem, I have 82 input neurons. Each of them represents a cell type. In the former setup of my computational machinery, the user is able to make global predictions, that are valid for any cell type. Now, on the contrary, I want the user to be able to make predictions for a specific cell type. The user should be able to say: "Make predictions only for the IMR90 cell type". So I have to highlight one cell type in the input layer, to differentiate it from, the others. – DavideChicco.it Sep 07 '16 at 15:09
  • By "Each [feature] represents a cell type[,]" do you mean that you have 82 boolean features? Are they exclusive? Does "Make predictions only for the IMR90 cell type" mean to predict with only that feature true? (I thought I understood your question, but your comment makes me doubt my initial interpretation.) – Sean Easter Sep 07 '16 at 18:12
  • @SeanEaster The input neurons are real values, not boolean. Sorry if I explained badly. – DavideChicco.it Sep 07 '16 at 21:54
  • Thanks, that's helpful. But if IMR90 is some real value, what does it mean to "make a prediction for only [that] cell type"? Is it a prediction based only on that value, holding that value constant, assigning some prior perceived predictive ability to that feature, or something else? – Sean Easter Sep 07 '16 at 21:57
  • I've hazarded an answer, but am still not confident I precisely understand your question: If I'm misinterpreted, please let me know and I'll just go ahead and delete it. Good luck! – Sean Easter Sep 10 '16 at 18:44

4 Answers4

2

Try the wide & deep network architecture? Directly link the "important" features with the output neuron.

[1]. https://arxiv.org/abs/1606.07792

1

{1} explored one way to take prior knowledge on features into account when training a neural network. Abstract:

Different features have different relevance to a particular learning problem. Some features are less relevant; while some very important. Instead of selecting the most relevant features using feature selection, an algorithm can be given this knowledge of feature importance based on expert opinion or prior learning. Learning can be faster and more accurate if learners take feature importance into account. Correlation aided Neural Networks (CANN) is presented which is such an algorithm. CANN treats feature importance as the correlation coefficient between the target attribute and the features. CANN modifies normal feedforward Neural Network to fit both correlation values and training data. Empirical evaluation shows that CANN is faster and more accurate than applying the two step approach of feature selection and then using normal learning algorithms.

I didn't read the paper carefully, I am unsure how sound it is, and I'd be quite cautious. The same author published a few other papers on the same topic, e.g. {2}. Personally I rely on backpropagation to do the job.

Perhaps another way could be to change the weigh update rule and/or weight initialization rule for this feature, so as to bias the weights connected to your important feature to have an absolute value larger than the other weights connected to the other features.

A last idea would be to connect your most important feature to layers other than the first layer.


Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
0

In neural networks, the "importance" of each signal is established during the learning phase. It comes hard coded in the model, rather than expressed by a nice numeric parameter. I'm afraid you may not be able to manually alter the importance of a feature.

One way of forcing it, if you are using dropout, is to avoid it on the signal the user judges "important". Other than this, I really can't see how to force it. Please, notice that I'm using "forcing" because what you want to do is ... counter-intuitive for almost any machine learning classifier.

user_1177868
  • 712
  • 4
  • 13
  • I think this might work, thanks! Another idea might be to apply L1 regularization to all the input feature nudes except the "important" one. What do you think? – DavideChicco.it Sep 07 '16 at 20:57
0

You might consider interpreting your neural network as a probabilistic graphical model. From "An Introduction to Variational Methods for Graphical Models", Jordan et al:

Neural networks are layered graphs endowed with a nonlinear "activation" function at each node (see figure 5). Let us consider activation functions that are bounded between zero and one, such as those obtained from the logistic function $f(z) = 1/(1 + e^{−z})$. We can treat such a neural network as a graphical model by associating a binary variable $S_i$ with each node and interpreting the activation of the node as the probability that the associated binary variable takes one of its two values. [...] The advantages of treating a neural network in this manner include the ability to perform diagnostic calculations, to handle missing data, and to treat unsupervised learning on the same footing as supervised learning. Realizing these benefits, however, requires that the inference problem be solved in an efficient way.

Later portions of the paper discuss how to do this efficiently. It would seem you could "highlight" a feature by changing the prior placed on its associated parameters.

Sean Easter
  • 8,359
  • 2
  • 29
  • 58