Given a random variable $X=x_i$ for $i=1,...,n$, with true distribution $p(x)$ and approximate distribution $q(x)$, its cross entropy is given by
$$H(p,q) = -\sum_{i=1}^np(x_i)\log q(x_i).$$
However in practice we find that the true distribution $p(x)$ is rarely if ever known, and thus instead we compute an "approximation" to the cross entropy, given by
$$-\sum_{i=1}^n\frac{1}{n}\log q(x_i).$$
Why is this a good approximation of the cross entropy?
I can't figure out why it would be, even if we sample enough points from the true distribution to make $q(x)$ a really good approximation for $p(x)$, why would we want all weights to remain the same? In that case wouldn't we want to just replace $\frac{1}{n}$ with $q(x_i)$? And if $q(x)$ is not known to be a good approximation, then setting $p(x_i)=\frac{1}{n}$ seems like a completely arbitrary choice.