A similar question was asked previously here. The answer was to use the a scheduler for the kl loss. My question is slightly different. For small images, reconstruction and kl loss magnitudes are similar. But for large images (>500px and using binary-cross entropy for loss calculation), reconstruction loss dominates. It can be as high as 2 orders of magnitude higher and thereby kl-loss has little or no impact on the training. Is it still a good idea to use a weight $\beta$ between (0,1) or should I be using a much larger range?
Asked
Active
Viewed 35 times
0
-
1I'm having trouble understanding what part of your question is not already answered in the other thread. You've written that the KL loss tends to dominate your reconstruction loss, and acknowledge that a scheduler is a well-worn path to solving that problem. Can you [edit] your post to explain why are you skeptical of that solution? – Sycorax Jul 01 '21 at 18:47
-
1I edited the original post. But my problem statement is actually the opposite: reconstruction loss dominates the KL loss, since the image size is much larger. – mesolmaz Jul 02 '21 at 09:04