0

I am training a CNN model which takes input of 128x128x3 color images and is trained to predict the coordinates of 4 landmarks on it (i.e, there are 2 * 4 = 8 values to be predicted if we count x and y coordinates separately. All of the coordinate values are scaled in [-1, 1] Following is my model class:

class LandmarkLocalizerCNN:

    def __init__(self, input_shape, landmark_point_count):
        self.input_shape = input_shape
        self.landmark_point_count = landmark_point_count

    def first_level_cnn(self):
        model = models.Sequential()

        model.add(layers.Conv2D(8, (5, 5), padding = 'same',activation = 'relu', input_shape = self.input_shape))
        model.add(layers.Conv2D(8, (5, 5), padding = 'same',activation = 'relu', input_shape = self.input_shape))
        model.add(layers.Conv2D(16, (5, 5), padding = 'same',activation = 'relu', input_shape = self.input_shape))
        model.add(layers.Conv2D(16, (5, 5), padding = 'same',activation = 'relu', input_shape = self.input_shape))
        model.add(layers.MaxPooling2D((2, 2)))

        model.add(layers.Conv2D(32, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.Conv2D(32, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.Conv2D(32, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.Conv2D(32, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.MaxPooling2D((2, 2)))

        model.add(layers.Conv2D(64, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.Conv2D(64, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.Conv2D(64, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.Conv2D(64, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.MaxPooling2D((2, 2)))


        model.add(layers.Conv2D(128, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.Conv2D(128, (3, 3), padding = 'same', activation = 'relu'))
        model.add(layers.MaxPooling2D((2, 2)))

        model.add(layers.Flatten())
        model.add(layers.Dense(64, activation = 'relu'))
        model.add(layers.Dense(32, activation = 'relu'))
        model.add(layers.Dropout(0.2))
        model.add(layers.Dense(16, activation = 'relu'))
        model.add(layers.Dense(16, activation = 'relu'))
        model.add(layers.Dropout(0.2))
        model.add(layers.Dense(16, activation = 'relu'))
        model.add(layers.Dense(16, activation = 'relu'))
        model.add(layers.Dense(2 * self.landmark_point_count))

        return model

And here is the created model:

first_level_cnn_model = LandmarkLocalizerCNN(input_shape = (128, 128, 3), landmark_point_count = 4).first_level_cnn()

After training for 375 steps I got the metrics as following:

WHY?

Here are some of my observations:

  • If I remove dropouts the training accuracy goes higher (around 0.9) quickly even with lesser number of layers.
  • With dropout layers training accuracy doesn't go that higher but in either cases validation accuracy hardly crosses 0.6
  • In this current architecture the graph shows fluctuations in both training and validation plots with some strange behaviors. For training, the accuracy and loss seems to be negatively correlated but for validation they seem to be positively correlated. But finally both of them saturated and no more improvements.

So what the main problem my model is suffering from. I tried increasing and decreasing the number of layers and also adding or removing dropouts but they only help changing the training accuracy. But the validation accuracy hardly crosses 0.6 whatever I do, so how can I make my model to perform better in unknown cases?

hafiz031
  • 185
  • 7
  • Accuracy is usually used to denote how often the correct class is predicted. But coordinates seem a bit different, since being off by 1 pixel (e.g. 123 vs 124) is different than predicting "cat" when the answer is "dog". So, how are you employing accuracy? And what loss function are you using to predict the landmark coordinates? – Sycorax Sep 28 '20 at 21:17
  • @Sycorax I used MSE as a loss function i.e, I used `first_level_cnn_model.compile(optimizer = opt, loss = tf.keras.losses.MeanSquaredError(), metrics = ['accuracy'])`, I know accuracy is a classification metric and it is not suitable for regression and also I am not sure how `keras` calculates accuracy for regression. But I used `accuracy` metric as I am not sure which metric would be perfect for regression and also because keras is not preventing me to use it. – hafiz031 Sep 28 '20 at 21:30
  • 1
    To compute accuracy, keras just reports the proportion of how often the predictions match the labels. In addition to the non-applicability of accuracy to this task, I believe that since your labels are floating-point numbers in $[-1,1]$, I surmise that there are some issues here around floating point number representation etc., so achieving a low accuracy is not particularly meaningful. Accuracy isn't really useful for classification anyway, and there's no law that you have to use accuracy to evaluate your model. See: https://stats.stackexchange.com/questions/312780/ – Sycorax Sep 28 '20 at 21:34
  • 1
    I don't have a great explanation for why the loss suddenly starts increasing; my best guess is that this is related to dropout: a large number of units are dropped, the loss is high, so the gradient step is bad. If the next step has the same problem, the model parameters may have moved to a flat region and become stuck. My suggestion is to not use dropout and decrease the learning rate by factors of 10 and study what happens. Try variations on the network until you can overfit, and then use [tag:regularization] to combat overfiting. – Sycorax Sep 28 '20 at 21:34
  • I don't know anything about landmark predictions, so maybe this guess is totally off-base, but I'd be worried that the model would tend to be biased towards landmarks nearish to the center of the image, since this kind of an "oblivious" model could achieve a lowish loss without regard to the input. So, maybe the model has "collapsed" after taking several steps in a bad direction? – Sycorax Sep 28 '20 at 21:47

0 Answers0