4

I am learning about pLSA (Probabilistic Latent Semantic Analysis) right now, in the hopes of being able to apply it to biomolecular annotation prediction.

I have a very simple question: How do you choose the number of topics / classes to use in the algorithm? I've searched also literature but I did not find anything enough useful.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
DavideChicco.it
  • 682
  • 1
  • 10
  • 24

1 Answers1

3

The number of topics / latent classes can be considered as a "meta" parameter of the model which has to be tuned using resampling (e.g. cross-validation) such that it minimizes your loss/risk function while keeping the run time of the algorithm reasonable.

Yevgeny
  • 1,422
  • 12
  • 11