Why is the Optimal Discriminator $D^{*}_G(x) = \frac{p_\text{data}(x)}{p_\text{data}(x) + p_g(x)}$ in Generative Adversarial Networks?

Question

Proposition 1,
The optimal discriminator is $$ D^{*}_G(x) = \frac{p_\text{data}(x)}{p_\text{data}(x) + p_g(x)} $$ At the proof, I couldn't understand about change of variables with integral.

Why the first line is changed to second line?! $$ V(G,D) = \int_x p_\text{data}(x)\log(D(x))\,dx + \int_z p_Z(z)\log(1-D(g(z)))\,dz \\ = \int_x p_\text{data}(x)\log(D(x)) + p_g(x)\log(1-D(x))\,dx $$

I tried to calculate it myself.

But a below condition is needed to change the first line of $V(G,D)$ to second line of $V(G,D)$ $$ p_z(z) \frac{1}{g'(z)}=p_g(x)$$

In summary.. My question is that..

Why the first line of V(G,D) can be changed to second line of V(G,D)
In my own trial to change the V(G,D), the above condition was needed. Is it appropriate condition?!

You should properly distinguish between $Z$ and $z$ in expressions like $p_Z(z)$. Without that distinction, expressions like $p_Z(3)$ and $p_Z(3)$ and $\Pr(Z\le z)$ could not be understood. (I also changed $log$ to $\log$ and did some other routine copy-editing. Note that with $\log$ rather than $log$ you get proper spacing in things like $a\log b$ and $a\log(b)$ without having to add spaces manually.) — Michael Hardy, Mar 11 '17 at 07:07

score 2 · Answer 1 · edited Mar 11 '17 at 20:32

2

Q1: Why the first line of $V(G,D)$ can be changed to second line of $V(G,D)$?

The task is to find the maximum value of $V(G,D)$ so perhaps better notation for the first line would be

$$\max[V(G,D)] = \max\left[\int_x p_\text{data}(x)\log(D(x))\,dx + \int_z p_Z(z) \log(1-D(g(z)))\,dz\right]$$

Then the second line

$$\max[V(G,D)]= \max \left[ \int_x p_\text{data}(x)\log (D(x)) + p_g(x) \log(1-D(g(z))) \, dx\right]$$

has the form $y → a \log(y) + b \log(1 − y)$ inside the integral, which achieves its maximum in $[0, 1]$ at $\frac a {a+b }$. That implies that $z=x$ allows for the maximum sum of the integrals, which allowed the first line to lead to the second line.

Q2: Is this appropriate?

If $g'(z)=1$ at $\max[p_g(x)]$. I think the problem suggests that $\max[V(G,D)] \neq V(G,D)$ except when $D^{*}_G(x) = \frac{p_\text{data}(x)}{p_\text{data}(x) + p_g(x)}$. The answer has the form $\frac a {a+b }$, where $p_\text{data}(x)=a$ and $P_g(x)=b$.

edited Mar 11 '17 at 20:32

Michael Hardy

7,094
1
20
38

answered Mar 01 '17 at 08:52

Carl

11,532
7
45
102

If the condition that I wrote is appropriate, what is the meaning of the condition?! In the paper and tutorials, there is no mention on the condition. So I cannot be sure about the condition – user3704652 Mar 01 '17 at 09:00
I don't know that it applies in this situation. It doesn't seem to. The way I am reading it $P_g(x)$ is just notation used to prevent confusion arising from use of max[$P_x(x)]$. That is, the max value is $P_g(x)$ – Carl Mar 01 '17 at 09:05
I cannot understand why the first line of V(G,D) is changed to the second line of V(G,D) still now. I think that your answer doesn't cover it in detail. Could you explain it again with detail information?! – user3704652 Mar 01 '17 at 09:10
I am leaving stuff out for just showing the bare outlines of the proof. I don't have the support arguments etc. but those don't add much. Basically, the when the max occurs $z$ is replaced by $x$, then since both regions are only integrated over $x$ we can combine both under one integral sign. Only confusing part is we don't know that we solved the max problem until we examine the second line which is maximized when $z=x$. – Carl Mar 01 '17 at 09:19
I think it would be easier to understand if it were written in reverse order. $y → a \log(y) + b \log(1 − y)$ then $\text{max}[V(G,D)]= \text{max}[\int_{x}p_{data}(x)\log (D(x)) + p_{g}(x)\log(1-D(g(z)))dx]$ then $\text{max}[V(G,D)] = \text{max}[\int_{x}p_{data}(x)\log(D(x))dx + \int_{z}p_{z}(z)\log(1-D(g(z)))dz]$ – Carl Mar 01 '17 at 17:17
@Carl I wonder what does it mean when the discriminator reaches its maximum state? I mean what does $D=\frac{a}{a+b}$ mean? – Lerner Zhang Jul 30 '18 at 05:38
Could you please explain to me why is the max of the function inside the integral also the max of the whole function? Am I allowed to derivate under the integral sign because the supports of the distributions are compact (is it?)? – Symòn Jul 05 '20 at 11:22
@Symòn I am having trouble understanding your question. Can you rephrase it? – Carl Jul 05 '20 at 13:09
Yes, sure. My problem is the following: we know that the function $F(y) = a \cdot log(y) + b \cdot log(1-y) $ is maximised at $\frac{a}{a+b}$ but the function we want to maximise is $\int F dx $. I cannot understand why those two fuctions have the maximum at the same point. (It is also very much possible that I haven’t understood the problem well). – Symòn Jul 05 '20 at 13:23
@Symòn I think because $z=x$. Don't know if that helps. – Carl Jul 05 '20 at 14:32

score 2 · Answer 2 · edited Mar 11 '17 at 20:34

2

You've basically gotten it. So the definition of $p_g$ (see first paragraph of section 4 Theoretical Results) is the distribution of samples $G(z)$ obtained when $z$ comes from distribution $p_z$. Thus

$$\int_z p_Z(z)\log(1-D(g(z))dz=E_{p_Z}[\log(1-D(g(z))]=E_{p_x}[\log(1-D(x))]$$

edited Mar 11 '17 at 20:34

Michael Hardy

7,094
1
20
38

answered Mar 01 '17 at 18:12

Alex R.

13,097
2
25
49

1

How did you change the equation from $$ E_{p_{z}}[..] $$ to $$ E_{p_{x}}[..] $$?! – user3704652 Mar 02 '17 at 07:28
@user3704652: Think of it this way, when you're calculating the expectation $E[g(z)]$ with respect to random variable $Z$ (and therefore density $p_z(z)$), it's equivalent to instead be treating $g(z)$ as a random variable, so that the probability density of $g(z)$ being equal to x is $p_g(x)$. – Alex R. Mar 02 '17 at 17:50
See for example proposition 1 of this: http://math.bard.edu/belk/math461/Probability.pdf – Alex R. Mar 02 '17 at 17:51

score 2 · Answer 3 · edited Mar 11 '17 at 20:29

Hi~ To understand the change of the variables, we can first take a look at the Figure.1 in Generative Adversarial Networks, Goodfellow et al (2014), eprint arXiv:1406.2661.

According to the paper.

The lower horizontal line is the domain from which $z$ is sampled and the above horizontal line is part of the domain of $x$. The upward arrows show the transformation $x = g(z)$.

Back to the equation it's clear that:

$$\int_z p_Z(z)\log(1-D(g(z))\,dz=E_{p_z}[\log(1-D(g(z))]$$

Since $x = g(z)$, we can replace $g(z)$ with variable $x$. Also notice that, in this case, $p_g$ is the distribution of $x$. As a result, we have this:

$$E_{p_Z}[\log(1-D(g(z))] = E_{p_g}[\log(1-D(x))]$$

Then we expand the expection to an integral form:

$$E_{p_g}[\log(1-D(x))] = \int_x p_g(x)\log(1-D(x))\,dx$$

Thanks for your efforts. I've attempted to provide a reference that satisfies the requirement to "provide the name of the original author" at the [help on referencing](https://stats.stackexchange.com/help/referencing) — Glen_b, Mar 11 '17 at 08:54

score 1 · Answer 4 · answered Nov 28 '18 at 06:10

Since $z \mapsto G(z)$ is a deterministic mapping from $\mathcal{Z}$ to $\mathcal{X}$, let $y = G(z)$, then $p(y|z) = \delta(y - G(z))$. Therefore

$$\begin{split} \int_{\mathcal{X}} p_g(y)\log(1 - D(y)) dy & = \int_{\mathcal{X}} \left[\int_{\mathcal{Z}}p(z,y)dz\right]\log(1-D(y))dy \\ & = \int_{\mathcal{X}} \left[\int_{\mathcal{Z}}p(z)p(y|z)dz\right]\log(1-D(y))dy \\ & = \int_{\mathcal{X}} \left[\int_{\mathcal{Z}}p(z)dz\right]p(y|z)\log(1-D(y))dy \\ & = \int_{\mathcal{Z}}p(z)\left[\int_{\mathcal{X}}\delta(y - G(z))\log(1 - D(y))dy\right]dz \\ & = \int_{\mathcal{Z}}p(z)\left[\delta(y-G(z)) * \log(1-D(y))\right]dz \\ & = \int_{\mathcal{Z}}p(z)\log(1 - D(G(z)))dz. \end{split}$$

The 2nd last row to the last row is the convolution property of the Dirac delta function.

Mr Tsjolder · Answer 5 · 2021-07-07T08:23:34.380

The only thing you seem to be missing is the change of variable formula for probabilities, which states that the distribution of a random variable, $X$, that is transformed to $Y = f(X)$ by the function $f$ is given by

$$p_Y(y) = p_X(f^{-1}(y)) \left|f'(f^{-1}(y))\right|^{-1}.$$

Therefore, if we write out the substitution $x = g(z)$ in the integral, this change of variables formula magically appears:

$$\int_z p_Z(z)\log(1-D(g(z)))\,dz = \int_x \underbrace{\frac{1}{g'(g^{-1}(x))} p_Z(g^{-1}(x))}_{=p_{g(z)}(x)} \log(1 - D(x))\,dx$$

Note that this ignores the absolute value. I am not quite sure whether/how much it matters in this case (since the gradient of the generator is definitely not guaranteed to be positive).

Why is the Optimal Discriminator $D^{*}_G(x) = \frac{p_\text{data}(x)}{p_\text{data}(x) + p_g(x)}$ in Generative Adversarial Networks?

5 Answers5