Classifier for grayscale images rendered from font files

Question

I want to train a classifier that helps sorting out a large directory off fonts. I know that I could do a analysis on the font name and the contents of the TTF and OTF files, but for educational reasons I want to do it with machine learning.

For every font I rendered a sample image that helps me to decide if I want to uses the file in other projects. That means I have two classes 'yes' and 'no'. I also create manually labels for 4389 images (384x384). The images was chosen by random.

Samples for label 'yes'

Samples for label 'no'

Actually I test a network similar to VGG16 and I get an accuracy of ~90%:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 rescaling_2 (Rescaling)     (None, 384, 384, 1)       0         
 conv2d_2 (Conv2D)           (None, 384, 384, 64)      640       
 conv2d_3 (Conv2D)           (None, 384, 384, 64)      102464    
 max_pooling2d_2 (MaxPooling  (None, 192, 192, 64)     0         
 2D)                                                             
 conv2d_4 (Conv2D)           (None, 192, 192, 128)     73856     
 conv2d_5 (Conv2D)           (None, 192, 192, 128)     147584    
 max_pooling2d_3 (MaxPooling  (None, 96, 96, 128)      0         
 2D)                                                             
 conv2d_6 (Conv2D)           (None, 96, 96, 256)       295168    
 conv2d_7 (Conv2D)           (None, 96, 96, 256)       590080    
 max_pooling2d_4 (MaxPooling  (None, 48, 48, 256)      0         
 2D)                                                             
 conv2d_8 (Conv2D)           (None, 48, 48, 512)       1180160   
 max_pooling2d_5 (MaxPooling  (None, 24, 24, 512)      0         
 2D)                                                             
 conv2d_9 (Conv2D)           (None, 24, 24, 512)       2359808   
 max_pooling2d_6 (MaxPooling  (None, 12, 12, 512)      0         
 2D)                                                             
 flatten_2 (Flatten)         (None, 73728)             0         
 dense_12 (Dense)            (None, 1024)              75498496  
 dense_13 (Dense)            (None, 1024)              1049600   
 dense_14 (Dense)            (None, 2)                 2050      
=================================================================
Total params: 81,299,906
Trainable params: 81,299,906
Non-trainable params: 0
_________________________________________________________________

And the statement to compile the network is:

model.compile(
  optimizer=tf.keras.optimizers.Adam(
    learning_rate=0.000001, beta_1=0.9, beta_2=0.999, epsilon=1e-07
  ),
  loss=tf.keras.losses.SparseCategoricalCrossentropy(),
  metrics=['accuracy']
)

The labeled images are split to 80/20 for training/test.

metric

I think training is done at step 20. When I uses the network to label 1000 random new samples, I have 38 wrong predictions. That is an accuracy of 96%. I do not understand why the difference to the metric (90%) is so high.

Question:

What can I change to come to a higher accuracy? 90% helps a lot to label new samples, but it is still time consuming.

Lots of suggestions here: https://stats.stackexchange.com/questions/365778/what-should-i-do-when-my-neural-network-doesnt-generalize-well But probably the best thing you could do is to plot more glyphs in your sample images, or collect more labeled examples. — Sycorax, Dec 21 '21 at 15:20
During the first try, I plot all glyphs I like to have in the font, but it was very hard to manually label them. But I agree, I could label the images with 16 glyphs and render new ones for training. Thank you for the link. — testo, Dec 21 '21 at 15:39

Classifier for grayscale images rendered from font files

0 Answers0