How can I prepare the input layer for recurrent neural network if there are many categorical variables?

Question

I am building a recurrent neural network (RNN). The feature set contains many categorical variables. Some of them are like users and items. In this case, if I use one-hot encoding and concatenate these vectors into a big one, the resulting vector will be super sparse. Is it fine to do this? I am not sure if this is normal for RNN.

Is there any other way to handle this case?

crossposted on quora: https://www.quora.com/unanswered/How-should-I-handle-many-categorical-variables-at-the-input-layer-of-recurrent-neural-network — Franck Dernoncourt, Aug 08 '16 at 21:51
That's fine, just for people to check out the link first before they answer — Franck Dernoncourt, Aug 08 '16 at 21:58
Entity embedding is a method that seems promising here. Cheng Guo and Felix Berkhahn "Entity Embeddings of Categorical Variables" https://arxiv.org/pdf/1604.06737.pdf — Sycorax, May 02 '18 at 15:07
Are these categorical features "metadata" or is it something like text, engagement indicators, etc.? Also is this a classification or regression problem? — shadowtalker, Sep 06 '18 at 18:36

score 4 · Answer 1 · edited Nov 16 '18 at 14:47

The "default" way of dealing with categorical variables for neural networks is to use embeddings. The most popular usage is word embeddings, where words are represented by vector representation (learned or pre-trained). The advantages of such approach is that it has smaller dimensionality then if you used one-hot encodings and they usually form meaningful representations of words, i.e. similar words have similar representations in the embedding space. Same idea can be applied to any other categorical variables, Sycorax gave one reference of paper by Guo and Berkhahn, but you can check also other references and this Medium post. Embeddings were used for categorical variables in Kaggle competitions, but you can also find many examples of recommender systems using such representations, for example this recent post on Google Cloud blog.

score 0 · Answer 2 · answered Aug 08 '16 at 22:45

This is not my area of expertise, but my understanding is that it's a lot harder to learn on a sparse representation. You won't have many training examples for each input, so most of the neurons will be training on just a few examples.

The canonical example for RNNs would be NLP, and I think it's fairly standard to transform the input text from something sparse (e.g., a one-hot encoding of word IDs), to a dense vector embedding (e.g., word2vec). This page is pretty good:

https://www.tensorflow.org/versions/r0.8/tutorials/word2vec/index.html

In your case it's hard to say what exactly would be best without knowing more about your data. But maybe you could find some sort of vector embedding that would work for your data. Then you could capture some of its essential properties and train your RNN effectively.

How can I prepare the input layer for recurrent neural network if there are many categorical variables?

2 Answers2

Linked