0

I've got a question. I thought about it so much, but cant get an idea how can I do that.

I want to create neural network in Encog framework. The neural network is about shopping. I have a database with shopping lists with products. All users have their private lists. I need to create a neural network that will generate new list based on lists in db. For example based on lists from 2 months.

But in Encog to input i have to send array of dobules. I read about normalization in documentation and was trying to search some ideas in internet but found nothing.

How can i "convert" the products to double? I mean to use it in neural network.

Many ppl just uses the Equation that will calculate the value between 0 to 1 or -1 to 0. But in my example, it's hard.

Thanks for help!

Caran
  • 3
  • 4

1 Answers1

1

Products can be thought of as categorical variables, e.g. "red shirt," "blue shirt," "black pants," etc.

  • For a categorical variable with $k$ levels, you can use binary encoding to make $k$ vectors with values $\{0,1\}$ or perhaps $\{-1,1\}$, indicating which category is "present". If you include bias neurons, this strategy introduces some redundancy to your model, though (cf "dummy variable trap" from your regression analysis textbooks). However, since neural networks are not identified in general, this is not an inherent obstacle for model estimation.

  • You can keep only $k-1$ categories. This is the standard regression categorical encoding, and it avoids the dummy variable trap.

  • You can use entity encoding, which is a more sophisticated network structure. It adds between 1 and $k-1$ hidden, linear neurons between the categorical input and the first fully-connected layer. This has some nice empirical results behind it.

"Entity Embeddings of Categorical Variables" by Cheng Guo, Felix Berkhahn

We map categorical variables in a function approximation problem into Euclidean spaces, which are the entity embeddings of the categorical variables. The mapping is learned by a neural network during the standard supervised training process. Entity embedding not only reduces memory usage and speeds up neural networks compared with one-hot encoding, but more importantly by mapping similar values close to each other in the embedding space it reveals the intrinsic properties of the categorical variables. We applied it successfully in a recent Kaggle competition and were able to reach the third position with relative simple features. We further demonstrate in this paper that entity embedding helps the neural network to generalize better when the data is sparse and statistics is unknown. Thus it is especially useful for datasets with lots of high cardinality features, where other methods tend to overfit. We also demonstrate that the embeddings obtained from the trained neural network boost the performance of all tested machine learning methods considerably when used as the input features instead. As entity embedding defines a distance measure for categorical variables it can be used for visualizing categorical data and for data clustering.

Sycorax
  • 76,417
  • 20
  • 189
  • 313