The reason is that the d.f. parameter is very hard to estimate well from data, particularly if you're also estimating the scale parameter. Indeed you can often end up with either silly estimates or unstable estimates (e.g. from a ridge in parameter space)
Better properties are often obtained in practice by simply assuming some low d.f. (I've also seen 5, 7 and 8 used) rather than estimating it, at least at the typical sample sizes seen in financial data, for example (which are often fairly large but not large enough to make the estimation problem easy or well-behaved).
[Note that 8 is the lowest d.f. for which the sample kurtosis has finite variance, which may have been a factor in why it was used in the instance I saw it.]