Deep Learning with many categories

Question

Do deep learning algorithms run into trouble when tasked with classifying high dimensional input into one of many categories? By many I mean thousands or millions. If it does, how could one deal with this problem? Any references?

@FranckDernoncourt yeah, sorry I'm not up to speed on the jargon with these things — Taylor, Dec 30 '16 at 17:35
In science, when categorizing with millions of possible categories, usually some hierarchcal system is built, like in biological systematics. There will be two cases: Building such a system, or cateforizing into a known system. Which case is yours? — kjetil b halvorsen, Dec 30 '16 at 18:47
@kjetilbhalvorsen thanks that's helpful. I don't really have one, I am just being curious at the moment. I know the output of say a CNN will give you a vector of probabilities over your categories, so I am curious if and why things become hard to discern when looking at examples besides this MNIST dataset. The more bins you divide up $1$ into, the smaller the differences in their volume. just wondering — Taylor, Dec 30 '16 at 19:02

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

If it does, how could one deal with this problem? Any references?

You can use hierarchical softmax, importance sampling, noise contrastive estimation, or negative sampling: they are commonly used in language modeling, for example.

FYI:

Why is hierarchical softmax better for infrequent words, while negative sampling is better for frequent words? (The answer gives a brief overview of what hierarchical softmax and negative sampling are)
http://www.deeplearningbook.org/ section "12.4.3 High-Dimensional Outputs": presents hierarchical softmax, importance sampling, noise contrastive estimation, and negative sampling.
Dyer, Chris. "Notes on Noise Contrastive Estimation and Negative Sampling." arXiv preprint arXiv:1410.8251 (2014). https://arxiv.org/abs/1410.8251

Deep Learning with many categories

1 Answers1

Linked