Let's do this from scratch, using only basic principles, rules of arithmetic and probability, and the (generalized) Binomial Theorem (developed first in the Seventeenth century by James Gregory and Isaac Newton).
The chance of observing one rare observation is called $p=1-q$ (and $q$ is the chance of making a "non-rare" observation).
Since all observations are assumed independent, the chance of making a sequence of $i \ge 0$ non-rare observations followed by a rare observation is $a(i+1)=pq^i$. Note that the index $i+1$ counts the rare observation along with the $i$ preceding non-rare ones.
The probability generating function for the number of observations made in order to see one rare observation matches each $a(i+1)$ with the corresponding power of an "abstract variable" $t$, thus:
$$f(t) = a(0) + a(1) t + a(2)t^2 + \cdots + a(i)t^i + \cdots = p t + pq t^2 + pq^2 t^3 + \cdots.$$
The right hand side sums a geometric series with starting value $pt$ and common ratio $qt$. In closed form this is
$$f(t) = \frac{pt}{1-qt}.$$
The probability generating function for the number of observations made in order to see $m\ge 0$ rare observations is
$$f(t)^m = \left(\frac{pt}{1-qt}\right)^m = p^m t^m \left(1-qt\right)^{-m}$$
because, according to the rules of addition and multiplication, the coefficient of $t^n$ in that power is a sum over all the possible ways to make the first, second, ..., $m^\text{th}$ rare observation. For each such way it takes the products of the chances. But that's precisely what the axioms of probability tell us to do when cases are disjoint (and no two ways are exactly the same)--you sum the chances--and independent (as assumed)--you multiply the chances.
The Binomial Theorem asserts
$$(1 + x)^k = \binom{k}{0} + \binom{k}{1}x + \binom{k}{2}x^2 + \cdots + \binom{k}{i}x^i + \cdots.$$
The sum continues until the coefficients vanish (when $k$ is a natural number) or forever (in all other cases, assuming $|x| \lt 1$). Plugging $-qt$ in for $x$ and $-m$ for $k$ expands $f(t)^m$ automatically:
$$f(t)^m = p^m t^m (1-qt)^{-m} = p^m t^m \sum_{i=0}^\infty\binom{-m}{i}(-qt)^i = \sum_{i=0}^\infty (-1)^i\binom{-m}{i}p^m q^{i}\, t^{m+i}.$$
By construction, for any $n$ the coefficient of $t^n$ in this sum is the chance of making $n$ observations before seeing $m$ rare ones, including the last observation (when the $m^\text{th}$ rare one is seen). Therefore we must inspect the term where $m+i=n$, which (since obviously $i=n-m$) is
$$w_n = (-1)^{n-m}\binom{-m}{n-m}p^m q^{n-m}.\tag{1}$$
That's a perfectly fine answer, (and explains why such distributions are often termed "negative binomial") but it doesn't exactly match Haldane's expression. It's easy to see why the answers give the same values, though. The Binomial coefficients are computed as fractions. The denominator is just $(n-m)! = 1(2)(3)\cdots(n-m)$. The numerator starts with $-m$, then counts down $n-m$ places $-m-1, -m-2, \ldots, -m-(n-m)+1 = 1-n$ and multiplies them all. Since all these terms are non-positive, it is tempting to absorb the factor of $(-1)^{n-m}$ simply by negating each of these $n-m$ values. Thus, provided $m\ge 1$,
$$(-1)^{n-m}\binom{-m}{n-m} = \frac{(m)(m+1)(m+2)\cdots(n-1)}{(n-m)!} = \binom{n-1}{n-m} = \binom{n-1}{m-1}.$$
Substituting this into $(1)$ yields
$$w_n = \binom{n-1}{m-1}p^m q^{n-m},$$
QED.