In general, distributions in these sorts of graphical models are chosen for support that suits the data, along with simplifying assumptions for computational convenience. Here, the authors say this much:
Unlike other BNP factorization methods, our model is not composed of conjugate pairs of distributions—we chose our distributions to be appropriate for spectrogram data, not for computational convenience.
The gamma has support on the positive reals, making it reasonable for $\textbf{W}$ and $\textbf{H}$ (see paragraphs beginning paper section 2). As to how why they chose gamma over other distributions with the same support, say the lognormal, unsure. (For the differences between the two, see answers to this question.) But, the GIG later used in variational inference has parameters that are a superset of the gamma, and the authors say this:
Since both $y$ and $1/y$ are sufficient statistics of $GIG(y; \gamma, \rho, \tau )$, this will not pose a problem during inference, as it would if we were to use variational distributions from the gamma family.
Meaning, using GIG allows for convenient updates in the inference algorithm when applied to the $p(\textbf{X}|\textbf{W}, \textbf{H}, \theta)$ term. I'd wager that choosing gamma over lognormal also makes for much cleaner math when deriving the updates for $\mathcal{L}$; confirmation left as an exercise.