I found a really good answer from user ajmooch on reddit and decided to post it here in case someone had the misconceptions I had:
There's several things to keep in mind here.
The first thing is that the BCE objective for the Generator can more
accurately be stated as "the images output by the generator should be
assigned a high probability by the Discriminator." It's not BCE as you
might see in a binary reconstruction loss, which would be BCE(G(Z),X)
where G(Z) is a generated image and X is a sample, it's BCE(D(G(Z)),1)
where D(G(Z)) is the probability assigned to the generated image by
the discriminator. Given a "perfect" generator which always has
photorealistic outputs, the D(G(Z)) values should always be close to
1. Obviously in practice there's difficulties getting this kind of convergence (the training is sort of inherently unstable) but that is
the goal.
The second is that in the standard GAN algorithm, the latent vector
(the "random noise" which the generator receives as input and has to
turn into an image) is sampled independently of training data. If you
were to use the MSE between the outputs of the GAN and a single image,
you might get some sort of result out, but you'd effectively be saying
"given this (random) Z, produce this specific X" and you'd be
implicitly forcing the generator to learn a nonsensical embedding of
the image. If you think of the Z vector as a high-level description of
the image, it would be like showing it a dog three times and asking it
to generate the same dog given three different (and uncorrelated)
descriptions of the dog. Compare this with something like a VAE which
has an explicit inference mechanism (the encoder network of the VAE
infers Z values given an image sample) and then attempts to
reconstruct a given image using those inferred Zs. The GAN does not
attempt to reconstruct an image, so in its vanilla form it doesn't
make sense to compare its outputs to a set of samples using MSE or
MAE.
There's been some work done recently on incorporating similarity
metrics into GAN training--this openAI paper adds an MSE objective
between G(Z) and X in the final FC layer of the discriminator (a la
Discriminative Regularization), which seems to work really well for
semi-supervised learning (as based on their insanely good SVHN
results) but doesn't really improve sample quality.
You can also slam VAEs and GANs together (as I have done and as
several others have done before me) and use the inference mechanism of
the VAE to provide guidance for the GAN generator, such that it makes
sense to do some pixel-wise comparisons for reconstructions.