I want to train a model that takes image as input and predicts 8 float numbers. Here's the model architecture:
from tensorflow.python.keras.applications.efficientnet import EfficientNetB4
from tensorflow.keras import models, layers
def prepare_model_eff(input_shape):
conv_base = EfficientNetB4(include_top=False, input_shape=input_shape)
conv_base.trainable = True # That's done deliberately!
model = models.Sequential()
model.add(conv_base)
model.add(layers.GlobalMaxPooling2D())
model.add(layers.Dropout(rate=0.2, ))
model.add(layers.Dense(8, bias_initializer=Constant(0.0)))
return model
Loss function: MSE
Metric: RMSE
To test the arhcitecture I try to predict constant values: 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0.
There're ~1500 images in the train set and ~300 images in the validation set, so there're 1800 images with the same constant outputs.
I expect the model to grasp the idea of predicting the same values and give a RMSE of 0.0000000, but surprisingly after 80 epochs models's RMSE is just 0.3.
Since the test task is super easy, I suppose there could be something wrong with the architecture. Perhaps, I'm shooting myself in the foot? Here's the model summary for your reference:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetb4 (Functional) (None, 12, 12, 1792) 17673823
_________________________________________________________________
global_max_pooling2d (Global (None, 1792) 0
_________________________________________________________________
dropout (Dropout) (None, 1792) 0
_________________________________________________________________
dense (Dense) (None, 8) 14344
=================================================================
Total params: 17,688,167
Trainable params: 17,562,960
Non-trainable params: 125,207