section 3.9.5 of The Deep Learning Book says
\begin{equation} \hat{p}(x) = \frac{1}{m} \sum_{i=1}^m \delta(x - x^{(i)}) \tag{3.25} \end{equation}
We can view the empirical distribution formed from a dataset of training examples as specifying the distribution that we sample from when we train a model on this dataset. Another important perspective on the empirical distribution is that it is the probability density that maximizes the likelihood of the training data.(see Sec. 5.5).
Sec. 5.5 of The Deep Learning Book talks about Maximum Likelihood Estimation, but what is empirical distribution good for? In the case of applying vgg on mnist, how to use the empirical distribution?