The general idea to compute p-values on finite possibilities is to run over all possibilities and count how many time the statistics is greater than your observed statistics.
Example for a binomial: You want to test if a coin is biased with 10 heads/tails. Compute the statistics $h_0=|heads-5|$ for your dataset. Under null (unbiased), for all sequences of 0s and 1s (who all have the same probability), compute the statistics and see how often it is greater than $h_0$. This is the p-value. There are $2^{10}=1024$ sequences to generate.
For Kruskal-Wallis: Call $\omega_0$ your dataset and $h_0$ the value of the KS statistics for your dataset.
Call $\omega$ a permutation of the 37 possibles values. For each $\omega$, attribute the 12 first to A, the 9 following to B... You get one virtual dataset also called $\omega$. For each $\omega$ , compute the value of of KW statistics, say $h(\omega)$. Then count how many times this value of $h(\omega)$ is greater or equal to $h_0$. Also count the total number of permutations. Divide, you get the p-value.
How many $\omega$? Well the number of permutations: $37!\approx 10^{43}$ (Stirling's formula). A usual computer does around $10^9$ atomic operations by second and each permutation would take at at least 1000 such operations so you can consider that you can calculate $10^6$ permutations per second. This is not feasible.
Now instead you can directly work on combinations instead of permutations. You can calculate the number as $\binom{37}{12}\binom{37-12}{9}...=\frac{37!}{12!9!...}\approx10^{22}$. Still not feasible.
This only describes the "naive" algorithms. But I think advanced algorithmic ideas (for exact values) become quite tough to understand as soon as the aim is to make them feasible even for values as small as 37. And they still need intensive computing resources. Further reading: http://faculty.virginia.edu/kruskal-wallis/
However it's not necessary to have real "exact" values. The usual method is to use a simple Monte Carlo method: instead of testing all permutations/combinations, we test $N$ of them picked up randomly. This is very simple to implement. It converges to the real p-value in $1\sqrt N$ in a way that can be controlled easily. You get the precision that you want in a reasonable time. Compared to $\chi^2$ approximation that can be poor for small sample size (37, 12, 9...) it is almost exact, since you don't assume $37\approx+\infty$ but use an $N$ as large as you want. It's what most stat softwares do.