Accuracy percent seems too high in Neural Network

Question

I built an artificial neural network that has a dependent variable called "Suspicious". This column is binary so only two outcomes. I have 297,771 "0" not suspicious or known good. Then I have only 1,100 rows in my data labeled as "1" for suspicious or bad. After the test set the confusion matrix looks like this:

cm
array([[59552, 0],
       [148,  75]])

This gives me a test accuracy of 99.75240%. This seems way too high. Is there a rule of thumb for how many bad or "1's" I should have in the data before I run it though the model, like 1/3, or 1/2?

Related: https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models — Sycorax, Aug 17 '18 at 18:37

score 4 · Accepted Answer · answered Aug 17 '18 at 12:44

4

If I were a naive prediction model, based on the overall distribution of data, I would likely guess "$0$" for every single outcome. This would give me an accuracy of:

$$ \frac{297771}{297771+1100} = 0.9963195 = 99.63195\% \text{ accuracy} $$

Based on this, your prediction model is only marginally better than a naive estimator which assumes all outputs are $0$. Food for thought.

answered Aug 17 '18 at 12:44

ERT

1,265
3
15

I guess I will increase the "1" observations in my dataset and run it again. – sectechguy Aug 17 '18 at 12:54
If you are able to do so easily, that would probably be a good idea! – ERT Aug 17 '18 at 12:59

Accuracy percent seems too high in Neural Network

1 Answers1