I'm building an MLP classification model and one of my features is the name of certain products. These names can be anything and in theory there could be an infinite number of different names in the model. However there are a pretty small number of names that we see a lot in our data, so I'd like to use the most common names as a categorical feature.
I'd like to use one-hot encoding to transform these names into something usable by the model, but my question is what to do with samples which do not have a common product name? My understanding was that these could be encoded as all zeroes, as they won't fit into any of the one-hot encoded feature's buckets. But is that a valid thing to do for one-hot encoding? Both Spark-ML and Scikit-Learn's one hot encoders don't seem to allow this.
An alternative is to put all the uncommon product names into their own shared bucket (an "everything else" bucket), but I'm unsure whether this will have unwanted effects on the model.