4

To keep all the inputs to a network on the same scale, they are usually normalized so that they end up being represented as number of standard deviations from the mean.

Is this something that needs to be done for the result of trained word embeddings, e.g. vectors from GloVe or word2vec? If so, should I use the mean/variance of only the embeddings, or all the samples in my data set?

wweber
  • 41
  • 1

0 Answers0