I have a collection of words from different communities. Each community has a different way of using language and will provide a different word embedding. I can concatenate the sentences from the different communities to produce one corpus, but I fear I will lose nuance between how language is used in the different communities.
Do you have any recommendations on what level I should run the word embedding on? Should I simply run it on all sentences, risking generalization. Or is there a way to factor in the differences between documents.
I'm relatively new to this. Any feedback will be helpful.