Number of samples necessary to model pmf (up to some error)

Question

Suppose I can sample outcomes from an unknown discrete probability distribution $P$ (the state space $\Omega$ is known). Let $Q$ be the distribution obtained by obtaining $s$ samples from $P$. Clearly as $s \rightarrow \infty $ then $Q \rightarrow P$, but what about the case of finite $s$ - can we find the total variation distance between $P$ and $Q$ for a choice of $s$? (An upper-bound on the total variation distance would do).

I think that the solution should depend on the cardinality of the state space (if we want to obtain something realistic then $s \geq |\Omega|$), and possibly on the probability associated with the outcomes $p_\omega$, but I don't know how to put the pieces together.

I saw the Estimate probability mass function from observed samples? post where they discuss confidence intervals of individual outcomes, but this is not quite what I need.Here, I am mostly interested on the total variation distance of the real and estimate probabilities.

Perhaps the result known as the [Dvoretzky-Kiefer-Wolfowitz inequality](https://en.m.wikipedia.org/wiki/Dvoretzky–Kiefer–Wolfowitz_inequality) in terms of empirical and true CDFs might be the kind of result you are looking for? It is a nonasymptotic, quantitative version of the [Glivenko-Cantelli theorem](https://en.m.wikipedia.org/wiki/Glivenko–Cantelli_theorem). — microhaus, Jul 04 '21 at 21:07

Number of samples necessary to model pmf (up to some error)

0 Answers0