I'm trying to implement latent dirichlet allocation on a name disambiguation project. My data set includes a corpus of documents. Each document looks like:
Author, co-author, title, institution
I understand that the input for LDA should be a document-term matrix. But How do I take advantage of the structure of the data set. Should I just generate a document-term matrix disregarding the structure?
Sorry if the question seem vague. I would love to further clarify on any ambiguities.
Thank you