Why is the loss stuck in high plateau?

Question

I'm struggling with my model (below), since despite some hyperparameters tuning i always end with a sudden rise of the loss function and then a 'infinite" plateau.

My hypothesis were: -learning rate and local minima issue? i tried several (1e-3,1e-4) -Optimizer issue? i tried SGD for example -Too complex model? i removed some layers or neurons -Metric issue? i tried MAE

Among theses hypothesis and perhaps others, which one seems the most likely ? I looked for differents curve patterns but didn't find this one.

timeseries = Input(shape=(1536,2), name='timeseries') #1536 time steps, and 2 features
features = Input(shape=(22,), name='features')  #22 features

y=LSTM(256)(timeseries) #256
y=Dropout(0.20)(y)
y=Flatten()(y)

x=Dense(28, activation='relu')(features) #14
x=Dense(14, activation='relu')(x) #14
x=Dense(14, activation='relu')(x) #7

#z = concatenate([x, y])
z=Concatenate(axis=1)([x,y])
z=Dense(64, activation='relu')(z) #64
z=Dense(64, activation='relu')(z) #64
x=Dense(32, activation='relu')(x) #32
x=Dense(16, activation='relu')(x) #16
z=Dropout(0.40)(z)

outputP = Dense(1, activation='softplus')(z)
outputR = Dense(1, activation='softplus')(z)
outputC = Dense(1, activation='softplus')(z)

modelLSTM256NN = Model(inputs=[timeseries,features], outputs= [outputP,outputR,outputC])
opt = Adam(lr=1e-3) #1e-3

modelLSTM256NN.compile(optimizer=opt, loss='mse')

Thanks!!

Do you apply scaling to your inputs and/or outputs? If so, how? What happens if you simplify the model, such as omitting the dense "timeseries" or "features" portion entirely? Does the problem appear if you only work with one of your output targets? Does the problem appear if you don't use dropout? Does the problem appear if you use a ReLU variant such as ELU or leaky ReLU? Does the problem appear if you use an even smaller learning rate such as 1e-5 or 1e-6? Does the problem appear if you eliminate all but 1 of your dense layers in each part of the model? Also, try all of these in combination — Sycorax, May 09 '21 at 00:46
Excellent! So i removed the dense branch of the model, no great effect, but indeed using even smaller learning rate (in my case 1e-5) helped a lot! I havn't seen such a small learning rate often! Thanks! — Gwénolé, May 09 '21 at 14:34
Great! Sounds like you could write up this finding as an answer. — Sycorax, May 09 '21 at 14:35
So removing the dense branch of the model (before contatenate) and use a very smaller learning rate (in my case 1e-5) helped! — Gwénolé, May 10 '21 at 08:20

score 0 · Answer 1 · answered May 10 '21 at 08:21

0

So, to remove the dense branch of the model (before contatenate) and use a very smaller learning rate (in my case 1e-5) helped here! Thanks!

answered May 10 '21 at 08:21

Gwénolé

21
2

Why is the loss stuck in high plateau?

1 Answers1