2

I'm struggling with my model (below), since despite some hyperparameters tuning i always end with a sudden rise of the loss function and then a 'infinite" plateau.

My hypothesis were: -learning rate and local minima issue? i tried several (1e-3,1e-4) -Optimizer issue? i tried SGD for example -Too complex model? i removed some layers or neurons -Metric issue? i tried MAE

Among theses hypothesis and perhaps others, which one seems the most likely ? I looked for differents curve patterns but didn't find this one.

enter image description here

timeseries = Input(shape=(1536,2), name='timeseries') #1536 time steps, and 2 features
features = Input(shape=(22,), name='features')  #22 features

y=LSTM(256)(timeseries) #256
y=Dropout(0.20)(y)
y=Flatten()(y)

x=Dense(28, activation='relu')(features) #14
x=Dense(14, activation='relu')(x) #14
x=Dense(14, activation='relu')(x) #7

#z = concatenate([x, y])
z=Concatenate(axis=1)([x,y])
z=Dense(64, activation='relu')(z) #64
z=Dense(64, activation='relu')(z) #64
x=Dense(32, activation='relu')(x) #32
x=Dense(16, activation='relu')(x) #16
z=Dropout(0.40)(z)

outputP = Dense(1, activation='softplus')(z)
outputR = Dense(1, activation='softplus')(z)
outputC = Dense(1, activation='softplus')(z)

modelLSTM256NN = Model(inputs=[timeseries,features], outputs= [outputP,outputR,outputC])
opt = Adam(lr=1e-3) #1e-3

modelLSTM256NN.compile(optimizer=opt, loss='mse')

Thanks!!

Gwénolé
  • 21
  • 2
  • 1
    Do you apply scaling to your inputs and/or outputs? If so, how? What happens if you simplify the model, such as omitting the dense "timeseries" or "features" portion entirely? Does the problem appear if you only work with one of your output targets? Does the problem appear if you don't use dropout? Does the problem appear if you use a ReLU variant such as ELU or leaky ReLU? Does the problem appear if you use an even smaller learning rate such as 1e-5 or 1e-6? Does the problem appear if you eliminate all but 1 of your dense layers in each part of the model? Also, try all of these in combination – Sycorax May 09 '21 at 00:46
  • 1
    Excellent! So i removed the dense branch of the model, no great effect, but indeed using even smaller learning rate (in my case 1e-5) helped a lot! I havn't seen such a small learning rate often! Thanks! – Gwénolé May 09 '21 at 14:34
  • 1
    Great! Sounds like you could write up this finding as an answer. – Sycorax May 09 '21 at 14:35
  • So removing the dense branch of the model (before contatenate) and use a very smaller learning rate (in my case 1e-5) helped! – Gwénolé May 10 '21 at 08:20

1 Answers1

0

So, to remove the dense branch of the model (before contatenate) and use a very smaller learning rate (in my case 1e-5) helped here! Thanks!

Gwénolé
  • 21
  • 2