I am currently writing an autoencoder in python (torch); its encoder is intended to serve as a compression tool. The input dataset contains a mix of numerical data (including large integers), categorical variables (encoded as one-hot vectors), and some binary variables; e.g. flatten([295743, 5600, 2400, [0, 0, 1, 0, 0, 0], [0, 0, 1], 1, 0])
would be a representative sample.
So far, I've been unable to stabilize the training process, with some (rare, unpredictable) runs learning, but the majority learning nothing or intermittently forgetting anything they do learn. I've tried a few different loss functions, including cosine embedding, K-L divergence, and dice loss; none of these resolved the issue. This leads me to believe that the issue is in the pre-processing of the data (and particularly, the mixing of variables of different types). That is, if there isn't a glaring error in the code defining the autoencoders, below.
For clarity, what I refer to as "failing to learn" is networks for which the loss graph, over the training loop, appears essentially random and, in particular, is not minimized at the last epoch. My primary questions are, as follows:
Does this assumption, that the mixture of categorical and numerical variables is the cause of this issue, seem grounded?
If not, what is the likely cause?
a) What steps are conventionally taken to mitigate this issue, assuming it isn't a simple programming error?import torch import torch.nn as nn import torch.nn.functional as F import torch.utils import torch.distributions import numpy as np from coders import Encoder, VariationalEncoder, Decoder from torch.utils.tensorboard import SummaryWriter class AutoEncoder(nn.Module): def __init__(self, DIMS): """ init. In: - DIMS, arr-like of ints, the dimensions of each layer (from input_dim --> latent_dim) """ super(AutoEncoder, self).__init__() self.encoder = Encoder(DIMS) self.decoder = Decoder(DIMS) self.writer = SummaryWriter() def forward(self, x): """ computes a forward pass through the autoencoder, using `x` as input. In: - x, arr-like of length, `input_dim`, an input sample Out: - x_reconst, arr-like of length, `input_dim`, the sample's reconstruction """ x_latent = self.encoder(x) x_reconst = self.decoder(x_latent) return x_reconst def train(self, data, epochs): """ trains the model for `epochs` on `data`. """ opt = torch.optim.Adam(self.parameters()) for epoch in range(epochs): for x in data: #x = x.to(device) x = x.float() opt.zero_grad() x_hat = self.forward(x) loss = ((x - x_hat) ** 2).sum() self.writer.add_scalar("Loss/train", loss, epoch) loss.backward() opt.step() # make sure all logs get written self.writer.flush() class Encoder(nn.Module): def __init__(self, DIMS): """ init. In: - DIMS: arr-like of ints, the dimensions of each layer (from input_dim --> latent_dim) """ super(Encoder, self).__init__() # init arr to store layers self.layers = nn.ModuleList() # set each layer properly for i in range(len(DIMS) - 1): l = nn.Linear(DIMS[i], DIMS[i + 1]) self.layers.append(l) def forward(self, x): """ computes a forward pass through the encoder, using `x` as input. In: - x, arr-like of length, `input_dim`, an input sample Out: - x_latent, arr-like of length, `latent_dim`, the sample's latent vector """ # flatten input x = torch.flatten(x) # for all but the last layer for i in range(len(self.layers) - 1): # pass x through the layer, apply relu x = F.relu(self.layers[i](x)) # compute the latent vector (apply the last layer) x_latent = self.layers[-1](x) # return return x_latent