8

I am reading Goodfellow's original paper on GANs.

What I struggle to understand is his notation of the subscript in expected values.

$$ \mathbb{E}_{\boldsymbol{x} \sim p_{data}(\boldsymbol{x})}\ldots $$

If I understand it correctly then $\boldsymbol{x}$ is a realization of some random variable $\mathbf{x}$, but how can there be an expectation wrt. $\boldsymbol{x}$?

Or is $\boldsymbol{x}$ a random variable?

Thank you.

Edit: I do not think this is duplicate, as the referenced question does not answer what $\boldsymbol{x}$ means.

Avraham
  • 3,182
  • 21
  • 40
pixelneo
  • 118
  • 8
  • I think it is a duplicate question, but the answer to the duplicate (at https://stats.stackexchange.com/questions/297158) is not a useful answer! Therefore I have voted to reopen this question, believing it focuses on understanding what the *random variable* and/or its distribution are, as opposed to understanding what an expectation is. – whuber Apr 06 '19 at 17:40

1 Answers1

6

$E_{x\sim p(x)}[f(X)]$ means the expected value of $f(X)$ if its assumed to be distributed wrt $p(x)$, e.g. for a continuous distribution we have: $$E_{x\sim p(x)}[f(X)]=\int f(x)p(x)dx$$

It's used when the distribution of $x$ subject to change in an optimization problem. Specifically, in the paper, authors have two distributions (in page 5) $p_g$ and $p_{data}$.

Edit: And, the $x$ in the subscript of the expected value notation is not a realization. It's the random variable; or more specifically, in the paper it is the random vector, $\mathbf{x}$ (It's also in bold in Page 5).

gunes
  • 49,700
  • 3
  • 39
  • 75
  • Thank you. However, I need to specifically know what $x$ is. Is it a random variable or its realization (a sample from distribution)? – pixelneo Apr 06 '19 at 10:27
  • 1
    It's definitely not a realization. Realizations have no meaning in that notation., since they don't have such varying distributions (i.e. just constants). And, in the paper it is written in bold and it is the random vector $\mathbf{x}$. – gunes Apr 06 '19 at 10:31
  • I strongly suspect the original notation does *not* refer to a continuous distribution. The subscripted expression "$x\sim p_{\text{data}}$" most likely refers to the random variable determined by sampling from the empirical distribution of the data (with replacement). That is never a continuous distribution. This interpretation also explains the potential for confusion: $\mathbf x$ starts out as the *data* -- just a collection of numbers -- but it is then used to define a random variable. – whuber Apr 06 '19 at 17:38
  • 2
    I've chosen the continuous case example arbitrarily to explain the notation. But, in the original paper, $p_g(x)$ and $p_{data}(x)$ indeed refer to continuous distributions (section 4). – gunes Apr 06 '19 at 18:24
  • What does `assumed to be distributed wrt ()`. Im finding it hard to wrap my head around that. Is p(x) standard deviation or variance here ? – DollarAkshay Apr 11 '19 at 00:38
  • $p(x)$ is the distribution (PDF or PMF) of $x$ – gunes Apr 11 '19 at 04:49