2

I'm trying to fit a neural net to a pretty simple 1-variable regression problem. My output is a probability and the input is a continuous feature. The association is clearly negative. The below gives

X = np.array(impute_df['runner1_speed'])
y = np.array(impute_df['dest_1__'])
sns.scatterplot(x=X, y=y);

enter image description here

But with the simple net below, the fitted values are positively associated with the predictor.

inputs = keras.Input(shape=(1,))
hidden = layers.Dense(16, activation="relu")(inputs)
outputs = layers.Dense(1, activation="sigmoid")(hidden)
impute_mod = keras.Model(inputs=inputs, outputs=outputs, name="impute_mod")
impute_mod.compile(
    loss=keras.losses.MeanSquaredError(),
    optimizer=keras.optimizers.Adam(learning_rate=0.001))
history = impute_mod.fit(X, y, batch_size=4, epochs=20, verbose=0)
sns.scatterplot(x=X, y=impute_mod.predict(X)[:,0])

enter image description here

I must be doing something stupid but I can't figure out what. Any ideas? Thanks so much!

dafrdman
  • 21
  • 1
  • I doubt that this model has finished training. What happens if you train for 200 or 2000 epochs, possibly reducing the learning rate when it plateaus? More suggestions here: https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn/352037#352037 Scaling the data to have 0 mean and variance 1 might help the model train faster, as described in the link – Sycorax Jul 14 '21 at 15:51
  • I don't think that's it... the validation loss plateaus after about 10 epochs even with the learning rate at 0.0001 – dafrdman Jul 14 '21 at 15:57
  • A loss plateau is not a guarantee that the model has finished training. As you've demonstrated, the trend is opposite the prediction, so my inference is that the model has simply plateaued at a high value of the loss. A problem with using MSE + sigmoid output is that the sigmoid can saturate and learning will be extremely slow. Try removing the simoid activation and just using an identity activation in the output. – Sycorax Jul 14 '21 at 16:01
  • 1
    Gotcha, thanks. WIth 2000 epochs the association was negative! I appreciate it – dafrdman Jul 14 '21 at 16:43

1 Answers1

1

My suspicion is that this network hasn't finished training. Quite plainly, the model fit is poor, so we should think through all parts of the modeling process to isolate factors which might contribute to this poor fit.

  • The training or validation loss may have plateaued, but this is not necessarily evidence that the model has finished training. Indeed, the trend is opposite the prediction, so my inference is that the model has simply plateaued at a high value of the loss.

  • One reason that the loss could plateau at a high value is that the sigmoid function has become saturated, so continuing training will be a slow process. This is because gradient-based updates are dependent on the value of the gradients, and for a saturated sigmoid unit, these gradients will be small.

My surmise is that continuing training for many more epochs may solve the problem. OP attempted this and commented that the model improved, finding the negative association implied by the scatter plot.

An alternative model would use an identity activation in the output layer. This changes the model and its interpretation, but because we remove the sigmoid unit and its saturating non-linearity, this model might be easier to estimate.

Sycorax
  • 76,417
  • 20
  • 189
  • 313