Benjamin Doerr gives (in the chapter "
Analyzing Randomized Search Heuristics: Tools from Probability Theory" in the book "Theory of Randomized Search Heuristics", see the link for an online PDF) a somewhat simple proof of
Proposition Let $T$ be the stopping time of the coupon collection process. Then $\Pr[T\le (1-\epsilon)(n-1)\ln n]\le e^{-n^{\epsilon}}$.
This seems to give the desired asymptotics (from @cardinal's second answer), but with the advantage of being true for all $n$ and $\epsilon$.
Here is a proof sketch.
Proof Sketch: Let $X_i$ be the event that the $i$-th coupon is collected in the first $t$ draws. Thus, $\Pr[X_i=1]=(1-1/n)^t$. The key fact is that the $X_i$ are negatively correlated, for any $I\subseteq[n]$, $\Pr[\forall i\in I, X_i=1]\le\prod_{i\in I}\Pr[X_i=1]$. Intuitively, this is fairly clear, as knowing that the $i$-th coupon in the first $t$ draws would make it less likely that the $j$-th coupon is also drawn in the first $t$ draws.
One can prove the claim, but enlarging the set $I$ by 1 at each step. Then it reduces to showing that $\Pr[\forall i\in I, X_i=1|X_j=1]\le\Pr[\forall i\in I,X_i=1]$, for $j\notin I$. Equivalently, by averaging, it reduces to showing that $\Pr[\forall i\in I, X_i=1|X_j=0]\ge\Pr[\forall i\in I,X_i=1]$. Doerr only gives an intuitive argument for this. One avenue to a proof is as follows. One can observe that conditioned on the $j$ coupon coming after all of the coupons in $I$, that the probability of drawing a new coupon from $I$ after drawing $k$ so far is now $\frac{|I|-k}{n-1}$, instead of the previous $\frac{|I|-k}{n}$. So decomposing the time to collect all coupons as a sum of geometric random variables, we can see that conditioning on the $j$-coupon coming after $I$ increases the success probabilities, and thus doing the conditioning only makes it more likely to collect the coupons earlier (by stochastic dominance: each geometric random variable is increased, in terms of stochastic dominance, by the conditioning, and this dominance can then be applied to the sum).
Given this negative correlation, it follows that $\Pr[T\le (1-\epsilon)(n-1)\ln n]\le (1-(1-1/n)^t)^n$, which gives the desired bound with $t=(1-\epsilon)(n-1)\ln n$.
Note added in proof: The link above is outdated. A new version of this result with a complete proof (that is, including the negative-correlation statement) can be found in Theorem 1.9.3 in B. Doerr. Probabilistic Tools for the Analysis of Randomized Optimization Heuristics. In B. Doerr and F. Neumann, editors, Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pages 1-87. Springer, 2020.