Data balancing in image classification

Asked Aug 09 '19 at 15:09

Active Aug 09 '19 at 15:09

Viewed 193 times

I've to segment defects from an image. The image consists of only tomatoes with it's defects in it. The defects and tomatoes in the dataset are as follows:

tomato = 20900

Defects:

tip = 2129
spots holes = 804
cuts cracks = 267
shrivelled = 193
glare = 3485
back tip = 137
stalk = 119
green area = 610

As one can see the data is highly unbalanced. Within a single image of tomatoes we may find some defects and we may not find any. How to do training of such cases of multi-class identification and classification ? I've tried many standard models in given in tensorflow-object-detection api.It detects tomatoes and glare well as there numbers are higher. Any suggestions ?

asked Aug 09 '19 at 15:09

Vedanshu

1

Unbalanced classes are almost certainly not a problem, and oversampling will not solve a non-problem: [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) A good start would be to build separate probabilistic models for each separate type of defect. These can serve as benchmarks for a more complex "omnibus" model that outputs probabilities for all defects simultaneously (and accounts for the fact that the defects likely occur together sometimes). – Stephan Kolassa Aug 09 '19 at 17:03
Is this a classification or a segmentation problem? – 0asa Aug 09 '19 at 21:11
@0asa I've to both segment and then classify. – Vedanshu Aug 15 '19 at 03:11

Data balancing in image classification

0 Answers0