I looked here: How it's better to include non-word features into text classification model? but there aren't any useful answers.
I have a possibly naive question: I'd like to incorporate meta data into a text classification model. However I'm not sure how to proceed.
Assume that I have a dataset that is $N \times 3$, where the columns are:
- text document - for example, an amazon review or newspaper article
- some meta_data - for example, number of words of length > 5, or time article was published
- category - either A, B or C
The goal is to use the text document and the meta_data to classify the example in the correct category.
Typically one would perform text classification on the text document (tokenize, lemmatize, remove stopwords, etc...) and build a sparse matrix of word counts. A model (for example SVM is popular) would be trained on this sparse matrix and tested on some unseen data, whereby it would be classified A, B or C.
But what about the meta data? I'd like to incorporate that somehow but in this paradigm it's unclear to me where I can inject it. I feel like what I want is a model of the form:
$y = \beta_0X_0 + \beta_1X_1$
Where $X_0$ is the meta data and $X_1$ is the result of the NLP part. But how would I set up such a model? Can I reduce the text classification portion into a single coefficient? Or am I conflating two distinct approaches of modeling text?