In deep learning, what is empirical distribution good for? In the case of applying vgg on mnist, how to use the empirical distribution?

Question

section 3.9.5 of The Deep Learning Book says

\begin{equation} \hat{p}(x) = \frac{1}{m} \sum_{i=1}^m \delta(x - x^{(i)}) \tag{3.25} \end{equation}

We can view the empirical distribution formed from a dataset of training examples as specifying the distribution that we sample from when we train a model on this dataset. Another important perspective on the empirical distribution is that it is the probability density that maximizes the likelihood of the training data.(see Sec. 5.5).

Sec. 5.5 of The Deep Learning Book talks about Maximum Likelihood Estimation, but what is empirical distribution good for? In the case of applying vgg on mnist, how to use the empirical distribution?

score 1 · Answer 1 · answered Aug 29 '19 at 19:35

1

This is all in the context of the data-generating distribution (or underlying distribution of the data). You can check another answer I made for more details on this.

First of all, note that this is a theoretical construct and not an existing, observable distribution. In short we consider that all there is a data-generating distribution from which all our training examples are sampled from. In the case of the MNIST dataset, you can think of this distribution as all possible $28 \times 28$ grayscale images containing handwritten digits. Since the dataset contains samples of such images, we consider them to follow the data-generating distribution.

The empirical distribution, on the other hand, is the distribution that we can observe, i.e. the distribution of the MNIST images. It's the distribution which we want to train the model on.

These concepts are useful when trying to understand the objective of generative models, such as GANs.

answered Aug 29 '19 at 19:35

Djib2011

5,395
5
25
36

1

+1 I would also note that the empirical distribution is ultimately the distribution our predictions will also have. (That's why many up/downsampling techniques must be used carefully.) – usεr11852 Aug 29 '19 at 20:01
Thanks for your answer. It is very helpful! Would you please check my understanding? 1. binomial distribution is also a theoretical rather than an observable distribution; 2. binomial distribution is a data-generating distribution; 3. "In short we consider that ..." is just an assumption. – czlsws Aug 31 '19 at 12:26
1) A binomial distribution is a type of discrete probability distribution whose outcome is binary. 2) If a feature contains only binary values (i.e. "yes/no"), then it can be considered to follow a binomial distribution. 3) Not exactly. For any data available, you can always consider that it follows some distribution. This however is rarely observable or understandable. What you essentially do is to say "there must be some distribution that the data follows". This, however, doesn't help you at anything because this distribution is unknown. – Djib2011 Aug 31 '19 at 23:53

In deep learning, what is empirical distribution good for? In the case of applying vgg on mnist, how to use the empirical distribution?

1 Answers1

Linked