Neural network hyperparameters don't affect performance

Question

I am trying to perform multi-label classification using neural network, with 3 hidden ReLU layers and a softmax output layer. I'm also using mini-batch SGD as the optimization algorithm and negative log likelihood as the loss function with L2 regularization. After performing feature scaling this is what I'm noticing is the following:

Algorithm performance plateaus in terms of validation and test error very early during training
Virtually no hyperparameter other than mini-batch size factors into the algorithm's performance even if I choose seemingly absurd values (e.g learning rate, number of hidden units, L2 reg coefficient)
I'm getting better performance in the non-scaled version of the data set, rather than the scaled one.

Especially bothersome for me is the second point, as I cannot explain it at all. Surely, the learning rate or the number of units in the hidden layers should affect performance even slightly, however that's not what I'm seeing. If anyone has any pointers for explaining and even rectifying the situation I would be really grateful.

Thanks in advance.

Does the performance plateau happen immediately? How many weight updates occur before the plateau begins? — Hugh, Dec 06 '16 at 16:38
What happens to the training loss? It strikes me that the initialization scheme could be the problem, since none of the other "knobs" change the result, in your telling. — Sycorax, Jul 06 '18 at 00:41

Neural network hyperparameters don't affect performance

0 Answers0