1

posting here is my last resort cause I can't find anything like it online. I trained a model to classify embeddings into categories (a simple three layer Dense neural network).

I'm using pandas for the input of the network, and it's performing quite well on the test set. The thing is if I process the whole set and read for example the prediction for the first element, and after that I select only the first row of the test set and pass it through the network, the prediction comes up with different values. What could be happening? this is the code in case my explanation was confusing:

i = 522
y_pred = model.predict(X_test)
y_pred_2 = model.predict(X_test.iloc[[i]])

print (f'{np.argmax(y_pred[i])} {np.argmax(y_pred_2)}')

output: 8 5

It's like my model is behaving differently if it processes the whole test set in a single run than if it processes a single row at a time.

The output shapes of y_pred and y_pred_2 are (603, 10) and (1, 10) respectively, where 10 is the number of classes I have.

Some example values for both predictions, with an arbitrary i:

y_pred[i]: array([1.3353945e-02, 2.8374636e-09, 1.4435661e-08, 3.4135045e-18,
   7.7986561e-02, 3.7737598e-03, 2.0284578e-10, 2.7154891e-03,
   9.0203673e-01, 1.3346069e-04], dtype=float32)

y_pred_2 = array([[1.1702824e-16, 1.6781385e-37, 2.5281618e-33, 0.0000000e+00,
        2.3075200e-09, 1.0000000e+00, 9.9125501e-35, 6.2606384e-22,
        5.8689110e-14, 2.3486194e-24]], dtype=float32)
  • It's not clear why you expect the same results for these 2 different inputs. Can you elaborate on that? – Sycorax Nov 05 '21 at 20:15
  • Hi @Sycorax, the inputs are the same, the only difference is that in the first case I'm processing the whole table and reading only one of the predictions, while in the second case I'm processing only one row of the table – Ramiro Suriano Nov 05 '21 at 20:49
  • What is the `model`? Does it include dropout, batch norm, or similar layers? – Sycorax Nov 05 '21 at 21:50
  • Model has three `Dense` layers and one `Dropout` layer after the first one. Do you think it could be that? – Ramiro Suriano Nov 05 '21 at 22:05
  • Update: it was the dropout layers haha, I thought those only worked while training and didn't affect the result while predicting stuff. – Ramiro Suriano Nov 05 '21 at 22:47

1 Answers1

1

In comments, OP writes that the culprit was the dropout layer. Dropout randomly sets parts of the network to zero, which creates non-deterministic behavior. The correct practice is to only use dropout during training. When it's time to make predictions, turn off dropout first.

Sycorax
  • 76,417
  • 20
  • 189
  • 313