I'm currently working on a Seq2Seq model for a chatbot and I'm converting every sentence to numerical vectors with word embeddings, i.e. GloVe.
My problem is that training doesn't progress; the model starts with around 0.0055 loss with mean-squared error as the loss function, and at the end of training it's still a number close to 0.0055, like 0.0054.
I was suspicious of the vocabulary used in the dataset so I checked the first 20000 sentences for non-conventional vocab(names, jumbled words like "whaaat", and sound effects like "mmmmmm") in each sentence, and it turns out that around 1960 words were not in the GloVe dictionary, out of the 14320 unique words in the 20000 sentences.
Does this ratio(1960:14320) have any significant effect on the training, like the inability to learn like my situation?
Also, how do I compensate for large amounts of such special vocabulary?
Here's some details on the datasets/word embedding vocab I'm using:
- Dataset: Cornell Movie Dialogs Corpus
- Word embedding: glove.6B.200d.txt in the zip file downloaded from this link