I want to say that only 0.001% of the files are less than X bytes
long.
Count the number of files that are less than X bytes long. If you have N files of which k are less than X bytes long then you claim: "$\frac{k}{N}$ percent files are less than X bytes long".
Alternatively, get the 0.001%-percentile. For instance, if you have 1,000,000 files then you find the 10th smallest file, let's say it was 21kB long, then claim "0.001% files are less than 21kB long".
Two things to be aware of. If you have less than 100,000 files, you obviously can't calculate 0.001% percentile. The "work around" would be to fit normal distribution to your data, then estimate the 0.001% percentile parametrically. For instance, you find that the average file size is 30kB and the standard deviation is 3kB. Now you can compute any perctile you wish or 4 sigmas. Of course, anything close to 0.001% will not be reliable if you have less than 100,000 files :)
The second issue is how "normal" is your data. For the first two percentile methods it doesn't matter. For the last, parametric, method it matters a lot. Since you're looking at the tails of the distribution, it's important that your distributional assumption is reasonable.
In finance, there's a technique called value-at-risk (VaR), which is very similar in the setup to your problem, btw. Take a look at the link, and you may find a ton of methods to answer your question. And in finance they fall to a similar trap: calculate parametric VaR at a very high confidence while having not large enough samples, e.g. 0.1% with less than 1000 observations.
Finally, in finance there's something called CVaR or conditional VaR. The idea is to calculate the average size of files that are less than X or at $\alpha$-percentile. So, the claim goes as "0.001%-percentile files have average size Y" or "the files that have average size X or less are in average of size Y"