I have a task, where I want to perform anomaly detection on a 1-dimensional input. The values that the inputs take are by default bounded in the range [0,1], but the distribution is skewed very much towards 0, in the point that almost 95% of the values in my training set are 0. In other words, only few of the features in each input vector are non-zero (not the same features in every input), but still tend to stay close to 0.
For performing the anomaly detection, I plan to use an autoencoder (with sigmoid units) that reconstructs the input and use the reconstruction error as a measure of anomaly. However, using the raw data to train the model, I figured that the reconstruction is noise with values close to 0, so that the error stays low at all times and no real training is performed.
My question is what preprocessing can I perform on this kind of data to help the autoencoder learn? Normalization? Some other way to aggregate the data to higher values?