Conventional autoencoder training instability

Question

I am currently writing an autoencoder in python (torch); its encoder is intended to serve as a compression tool. The input dataset contains a mix of numerical data (including large integers), categorical variables (encoded as one-hot vectors), and some binary variables; e.g. flatten([295743, 5600, 2400, [0, 0, 1, 0, 0, 0], [0, 0, 1], 1, 0]) would be a representative sample.

So far, I've been unable to stabilize the training process, with some (rare, unpredictable) runs learning, but the majority learning nothing or intermittently forgetting anything they do learn. I've tried a few different loss functions, including cosine embedding, K-L divergence, and dice loss; none of these resolved the issue. This leads me to believe that the issue is in the pre-processing of the data (and particularly, the mixing of variables of different types). That is, if there isn't a glaring error in the code defining the autoencoders, below.

For clarity, what I refer to as "failing to learn" is networks for which the loss graph, over the training loop, appears essentially random and, in particular, is not minimized at the last epoch. My primary questions are, as follows:

Does this assumption, that the mixture of categorical and numerical variables is the cause of this issue, seem grounded?

If not, what is the likely cause?
a) What steps are conventionally taken to mitigate this issue, assuming it isn't a simple programming error?

 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 import torch.utils
 import torch.distributions
 import numpy as np

 from coders import Encoder, VariationalEncoder, Decoder
 from torch.utils.tensorboard import SummaryWriter

 class AutoEncoder(nn.Module):
     def __init__(self, DIMS):
         """
         init.

         In:
         - DIMS, arr-like of ints, the dimensions of each layer (from input_dim 
         --> latent_dim)
         """
         super(AutoEncoder, self).__init__()
         self.encoder = Encoder(DIMS)
         self.decoder = Decoder(DIMS)
         self.writer = SummaryWriter()

     def forward(self, x):
         """
         computes a forward pass through the autoencoder, using `x` as input.

         In:
         - x, arr-like of length, `input_dim`, an input sample

         Out:
         - x_reconst, arr-like of length, `input_dim`, the sample's reconstruction
         """
         x_latent = self.encoder(x)
         x_reconst = self.decoder(x_latent)
         return x_reconst

     def train(self, data, epochs):
         """
         trains the model for `epochs` on `data`.
         """
         opt = torch.optim.Adam(self.parameters())
         for epoch in range(epochs):
             for x in data:
                 #x = x.to(device)
                 x = x.float()
                 opt.zero_grad()
                 x_hat = self.forward(x)
                 loss = ((x - x_hat) ** 2).sum()
                 self.writer.add_scalar("Loss/train", loss, epoch)
                 loss.backward()
                 opt.step()
         # make sure all logs get written
         self.writer.flush()

 class Encoder(nn.Module):
     def __init__(self, DIMS):
         """
         init.

         In:
         - DIMS: arr-like of ints, the dimensions of each layer (from input_dim 
             --> latent_dim)
         """
         super(Encoder, self).__init__()
         # init arr to store layers
         self.layers = nn.ModuleList()
         # set each layer properly
         for i in range(len(DIMS) - 1):
             l = nn.Linear(DIMS[i], DIMS[i + 1])
             self.layers.append(l)

     def forward(self, x):
         """
         computes a forward pass through the encoder, using `x` as input.

          In:
         - x, arr-like of length, `input_dim`, an input sample

         Out:
         - x_latent, arr-like of length, `latent_dim`, the sample's latent vector
         """
         # flatten input
         x = torch.flatten(x)
         # for all but the last layer
         for i in range(len(self.layers) - 1):
             # pass x through the layer, apply relu
             x = F.relu(self.layers[i](x))
         # compute the latent vector (apply the last layer)
         x_latent = self.layers[-1](x)
         # return
         return x_latent

Conventional autoencoder training instability

0 Answers0