In the original LDA paper it is stated that:
The parameters for a k-topic pLSI model are k multinomial distributions of size V and M mixtures over the k hidden topics. This gives kV +kM parameters and therefore linear growth in M. The linear growth in parameters suggests that the model is prone to overfitting and, empirically, overfitting is indeed a serious problem[.]
Also:
LDA is a well-defined generative model and generalizes easily to new documents. Furthermore, the k+kV parameters in a k-topic LDA model do not grow with the size of the training corpus.
But what I understand is that LDA also has those $kV + kM$ parameters but not as hyper-parameters. So this is irrelevant to overfitting. I.e., in pLSA these posteriors must be estimated ($M$ is the number of documents):
$p(z|d): kM$ parameters,
$p(w|z): kV$ parameters,
and in LDA the following posteriors have to be estimated:
$p(\Theta_d|\alpha): kM$ parameters ($\Theta_d$ is $k$-dimensional),
$p(w|z): kV$ parameters,
and two parameters $\alpha$ and $\eta$, (called hyperparameters).
Thus, the number of posteriors to be estimated is approximately the same. Why LDA is claimed to have solved overfitting problem of pLSA? I agree that since Dirichlet distribution with a low $\alpha$ tends to generate sparser distributions than Dirichlet with $\alpha=1$ (or uniform) as in pLSA, and this sparsity might help reducing the overfitting a bit, but still the number of parameters are similar.