Statistically prove classification accuracy is acceptable

Question

I made a neural network to do binary classification with medical images. Since I evaluate test accuracy (x% accurate), is there any statistical method which can be used to calculate threshold accuracy(y%) to prove neural network classification accuracy is acceptable since x% > y%.

Since I'm doing a binary classification, google says having more than 50% of accuracy will be acceptable. But I'm dealing with medical images and I need to have higher accuracy for my model. So based on details of images of my two classes, can I calculate a threshold value?

UPDATE

I'll state the complete problem here.

I need to classify several datasets (Say n number of datasets) using m number of classifiers where n>m

About datasets, one datasets has two folders of medical images and one folder with images which can be used and in other folder, images which cannot be used. So my classifier is a binary classifier which identify whether given image can be used or not.

Likewise I have n number of datasets which have similar data but not identical. Example : ECG 3 leads data (3 classes : Lead I is kind of similar to lead 2 but not identical.)

I used my classifier with other datasets (which did not used to train my classifier) to evaluate how well it classifies these datasets. For some datasets I got accuracy higher than 90% and for some lower than 80%. So I thought, if there is a method to calculate a threshold value for the classifier and show since classification accuracy for x dataset with classifier is smaller than threshold value, need to train a classifier for that dataset alone.. This is what I wanted.

ECG Image (With Noise) :

ECG Image (Without Noise) :

I don't follow this. How can a classifier built for medical images be used for ECG data? Aren't these entirely different types of input data? What are the images? MRIs would be spatial, whereas ECG would be time series. They would have very different dimensions, scales, & the same number would mean very different things for the two. — gung - Reinstate Monica, Aug 03 '18 at 13:49
So the 1st set of images & the ECG data are the same type? Why would you use ECGs as *images*? Why not use the sequential values? — gung - Reinstate Monica, Aug 03 '18 at 13:53
@gung I'm doing a part of a project and this is the dataset I received. They do need to classify them using images but not with numerical values. So I created a convNet for that. I only have images as inputs. This convnet was trained using ECG data (Images). All other datasets contain images — Samitha Nanayakkara, Aug 03 '18 at 13:56
I'm guessing the thing for you to do is go back & ask them to rethink their project. This seems like it started off on the wrong foot & trying to figure out how to save it now is not worth people's time & effort. — gung - Reinstate Monica, Aug 03 '18 at 13:58
@gung I asked the same but they have selected that method because of several valid reasons for their project. So I can't argue with them. Isn't there anything I can do? — Samitha Nanayakkara, Aug 03 '18 at 14:00
I would be *extremely* interested in *valid* reasons for taking ECG data, plotting it, and then classifying the images, instead of classifying the ECG data themselves. (But your edit does not change [my point](https://stats.stackexchange.com/a/360557/1352).) — Stephan Kolassa, Aug 03 '18 at 16:41
@StephanKolassa They did not have enough data. What they said is they only had 500 ECGs per class (With Noise 500 and without 500) — Samitha Nanayakkara, Aug 03 '18 at 16:46
I don't really understand. How is not having enough data a reason to turn the little data we have into images and classify those? — Stephan Kolassa, Aug 03 '18 at 16:47
@StephanKolassa Since they used convolution neural network, it should have thousands of data to train them (like 15000 ECG files). Unavailability of such huge amount of data led to that decision as they say — Samitha Nanayakkara, Aug 03 '18 at 16:50
@Sam94 I know this is outside of your domain of influence on the project, but you should really know that their reasoning makes NO sense. Working with the images of the data instead of the raw data themselves only makes the job of classifying harder, since the conversion from raw data to images of the data is lossy, and introduces irrelevent context. If there is any way for you to convince your stakeholders to instead provide you with the raw data, it will make your life much easier. — Matthew Drury, Aug 03 '18 at 21:50

score 4 · Answer 1 · answered Aug 03 '18 at 13:21

4

Classification accuracy is an improper accuracy scoring rule which assumes a utility function that should be out of your control. Application of a cutoff for accuracy uses another utility function that should be out of your control. Details are here.

answered Aug 03 '18 at 13:21

Frank Harrell

74,029
5
148
322

Then how can I evaluate and see whether my model is acceptable? Any idea? – Samitha Nanayakkara Aug 03 '18 at 13:26
1

Good article, Frank - though I don't understand your "should be out of your control" point. Can you explain? – Cam.Davidson.Pilon Aug 03 '18 at 14:05
2

I mean that the job of the statistical analyst is to create prediction models that can provide optimum inputs for the decision makers. The analyst should not encode the decision criteria, i.e., loss/cost/utility function into this process. The utility function is provided by the decision makers who are better informed about the harm or lost opportunities from making incorrect decisions. – Frank Harrell Aug 04 '18 at 13:50

score 4 · Answer 2 · answered Aug 03 '18 at 21:04

Setting aside whether the project you've been handed makes sense, if you just wanted to know if your accuracy was $<90\%$, you could take the classifications from your test set (the out of sample estimate) and run a binomial test on them against a fixed proportion of $.9$. Then, if they are lower, you could work on a separate classifier just for those types of images. This is all pretty straightforward.

You are making a number of assumptions, however, and it is unlikely that they are warranted. (It's possible they are in your case, but that would be the exception.) In general, accuracy is not a good measure of classification performance, as @StephanKolassa has pointed out.

In addition, the idea that you will build a different classifier assumes the relationship between your data and the output differs between the datasets. It is entirely possible that the signal is just weaker or, equivalently, that there is more noise in that dataset.

What you really should want is just a model that will capture and extract the predictive information that's there, and no more. The best model may well have lower accuracy and you should prefer that (at least if you were able to have perfect knowledge of the true relationship independent of your data and your model). In lieu of that, people try to see if the model's predicted probabilities are right, and that is assessed via metrics like the Brier score instead of accuracy.

score 3 · Answer 3 · answered Aug 03 '18 at 13:20

3

We can't say how easily your data are classifiable. This depends heavily on how similar targets and non-targets are, which we don't know. How to know that your machine learning problem is hopeless?

Also see Why is accuracy not the best measure for assessing classification models?

answered Aug 03 '18 at 13:20

Stephan Kolassa

95,027
13
197
357

Thank you. So, how can I then evaluate my classifier is acceptable? I got 98% testing accuracy for my model. So I think my classifier is good. But need to prove statistically. – Samitha Nanayakkara Aug 03 '18 at 13:25
3

What does "prove statistically" mean to you? I cannot imagine any coherent meaning in this context. – gung - Reinstate Monica Aug 03 '18 at 13:27
@gung Sorry for my ignorance. But I need statistical method to calculate a thresholding accuracy and show my model is acceptable if that is possible? – Samitha Nanayakkara Aug 03 '18 at 13:29
2

It isn't, at least not in this generality. That's my point. – Stephan Kolassa Aug 03 '18 at 13:32
@StephanKolassa Please check question update – Samitha Nanayakkara Aug 03 '18 at 13:43
@gung Please check question update – Samitha Nanayakkara Aug 03 '18 at 13:44

Statistically prove classification accuracy is acceptable

3 Answers3