I have a binary classifier for a highly imbalanced multivariate time series.
I use an LSTM Network to predict the next time step and use the prediction error to decide whether a data point is an anomaly or not. In addition, I have the advantage of being able to train my Network on a data set that contains only negative cases.
I have a training and validation set for the network and a test set for the final classification. The positive class makes up ~1% of the test data (~20/2000). The use case where I have suffers from the chance of being abandoned if the network results in too many false positives. So it is more like finding a needle in a haystack.
My PR-AUC is stuck at around 0.05 to 0.10. I currently use the PR-Curve to select the threshold where the F1-Score is highest. The model with the highest PR-AUC returns a result which is somewhat close to what I want (5TP, 10 FP).
So what is a "good" PR-AUC score given a highly unbalanced data set? How can I interpret that?
Can I perform undersampling on the test data which I only use to make predictions and classify the observations based on the prediction error?