1

I want to train a model that takes image as input and predicts 8 float numbers. Here's the model architecture:

from tensorflow.python.keras.applications.efficientnet import EfficientNetB4
from tensorflow.keras import models, layers


def prepare_model_eff(input_shape):
    conv_base = EfficientNetB4(include_top=False, input_shape=input_shape)
    conv_base.trainable = True # That's done deliberately!
    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.GlobalMaxPooling2D())
    model.add(layers.Dropout(rate=0.2, ))
    model.add(layers.Dense(8, bias_initializer=Constant(0.0)))
    return model

Loss function: MSE

Metric: RMSE

To test the arhcitecture I try to predict constant values: 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0.

There're ~1500 images in the train set and ~300 images in the validation set, so there're 1800 images with the same constant outputs.

I expect the model to grasp the idea of predicting the same values and give a RMSE of 0.0000000, but surprisingly after 80 epochs models's RMSE is just 0.3.

Since the test task is super easy, I suppose there could be something wrong with the architecture. Perhaps, I'm shooting myself in the foot? Here's the model summary for your reference:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
efficientnetb4 (Functional)  (None, 12, 12, 1792)      17673823  
_________________________________________________________________
global_max_pooling2d (Global (None, 1792)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1792)              0         
_________________________________________________________________
dense (Dense)                (None, 8)                 14344     
=================================================================
Total params: 17,688,167
Trainable params: 17,562,960
Non-trainable params: 125,207
SagRU
  • 121
  • 3
  • 4
    Does this answer your question? [What should I do when my neural network doesn't learn?](https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn) – mhdadk Dec 05 '21 at 13:22
  • @mhdadk well, that's a great answer indeed, but I'd like to get help with my particlular case. My neural network learns, but I don't understand why it doesn't reach the absolute zero loss. I've created a test scenario when dataset quality is irrelevant, because output is always the same. So it's either architectural flaw or poorly managed hyperparameters. Perhaps, an experienced expert in DML can examine the provided code and say "Architecture seems to be fine for your task, seek elsewhere" or "Hey, you're definitely missing BatchNorm layer after GlobalMaxPooling layer!". Any hint is welcome! – SagRU Dec 05 '21 at 17:05
  • Why should it achieve 0 training loss? – Sycorax Dec 08 '21 at 18:35

0 Answers0