pLSA - Probabilistic Latent Semantic Analysis, how to choose topic number?

Question

I am learning about pLSA (Probabilistic Latent Semantic Analysis) right now, in the hopes of being able to apply it to biomolecular annotation prediction.

I have a very simple question: How do you choose the number of topics / classes to use in the algorithm? I've searched also literature but I did not find anything enough useful.

Yevgeny · Accepted Answer · 2012-01-18T18:39:14.943

3

The number of topics / latent classes can be considered as a "meta" parameter of the model which has to be tuned using resampling (e.g. cross-validation) such that it minimizes your loss/risk function while keeping the run time of the algorithm reasonable.

edited Jan 18 '12 at 18:39

answered Jan 18 '12 at 18:32

Yevgeny

1,422
12
11

pLSA - Probabilistic Latent Semantic Analysis, how to choose topic number?

1 Answers1