0

I have a binary classifier for a highly imbalanced multivariate time series.

I use an LSTM Network to predict the next time step and use the prediction error to decide whether a data point is an anomaly or not. In addition, I have the advantage of being able to train my Network on a data set that contains only negative cases.

I have a training and validation set for the network and a test set for the final classification. The positive class makes up ~1% of the test data (~20/2000). The use case where I have suffers from the chance of being abandoned if the network results in too many false positives. So it is more like finding a needle in a haystack.

My PR-AUC is stuck at around 0.05 to 0.10. I currently use the PR-Curve to select the threshold where the F1-Score is highest. The model with the highest PR-AUC returns a result which is somewhat close to what I want (5TP, 10 FP).

So what is a "good" PR-AUC score given a highly unbalanced data set? How can I interpret that?

Can I perform undersampling on the test data which I only use to make predictions and classify the observations based on the prediction error?

Teapot
  • 1
  • You have 2 questions gere and it's best if you focus on one in a question. But regarding what a 'good' value is, [we likely cannot give you a meaningful answer](https://stats.stackexchange.com/questions/414349/is-my-model-any-good-based-on-the-diagnostic-metric-r2-auc-accuracy-rmse). – mkt May 08 '20 at 05:57
  • Balancing is a non-solution to a non-problem: [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) We don't know what a "good" value for your KPI in your specific domain is: [How to know that your machine learning problem is hopeless?](https://stats.stackexchange.com/q/222179/1352) – Stephan Kolassa May 08 '20 at 06:40
  • The PR-AUC baseline is dependent on our "rare class" prevalence on the sample so we have to be careful about reporting improvements if we under-sample. – usεr11852 May 08 '20 at 07:34

0 Answers0