Choice of $\alpha$ and $\beta$ is indeed tricky, since it impacts the topic modeling results. The Gibbs sampling paper by Griffiths et al. gives some insight into this:
The value of $\beta$ thus affects the granularity of the model: a
corpus of documents can be sensibly factorized into a set of topics at
several different scales, and the particular scale assessed by the
model will be set by $\beta$. With scientific documents, a large value
of $\beta$ would lead the model to find a relatively small number of
topics, perhaps at the level of scientific disciplines, whereas
smaller values of $\beta$ will produce more topics that address
specific areas of research.
Eventually for scientific documents, the authors chose the following hyper-parameters, $\beta=0.1$ and $\alpha=50/T$. But they had a corpus of around $28K$ documents and a vocabulary of $20K$ words, and they tried several different values of $T: [50, 100, 200, 300, 400, 500, 600, 1000]$.
Regarding your data. I have no experience with analyzing financial text data, but for the choice of
$\alpha$ and $\beta$, I would ask myself the following questions:
- Given my word vocabulary, do I expect my resultant topics to be sparse? For most cases, this is true. Hence, typically the topic prior is chosen to be sparse with $\beta < 1$.
- Given the topics, do I expect the distribution of topics in each document to be sparse? That is, each document only represents a few topics. If yes, then $\alpha < 1$.
Answering the above questions may not be straight-forward with limited knowledge of the data. Since you have limited data, I would choose multiple values of $\alpha$ and $\beta$ - ranging from sparse to non-sparse priors - and find which one suits the dataset by computing the perplexity over some hold-out data. To put it more concretely:
- Choose $\alpha_m$ from $[0.05, 0.1, 0.5, 1, 5, 10]$
- Choose $\beta_m$ from $[0.05, 0.1, 0.5, 1, 5, 10]$
- Run topic modeling on training data, with $(\alpha_m, \beta_m)$ pair
- Find model perplexity on hold-out test data
- Choose the value of $\alpha_m$ and $\beta_m$ with the minimum perplexity
Resources: