I've been building a Wasserstein GAN in Keras recently following the original Arjovsky implementation in PyTorch and ran across an issue I've yet to understand.
To my knowledge, the critic network is first trained on a real batch of data, then trained on a batch of data generated from a noise prior via the generator. The critic's loss function is arranged such that it estimates the EM distance (maximizes the distance between the two distributions) then clips its own weights to ensure it is 1-Lipschitz. Then, the generator generates a new batch of images from a noise prior, passes these through to the critic who then "informs" the generator of the Wasserstein-1 distance between the true distribution and the distribution of the images it just created. It does this via the loss function of the critic. The critic's weights are frozen and the error propagates all the way back through to the generator who then updates its parameters to minimize the Wasserstein distance. This repeats until the loss (hopefully) converges to near zero and the distributions are approximately equal.
Is this correct? Where would the generator's loss function ever be used? It seems to me the compound network of the generator + critic uses the critic's loss function to propagate the error back to the generator and therefore the generator's loss function is never used directly. Of course, in Keras one would still need to give the generator a loss function so that it compiles properly but I just don't see it being used...