Cost function of neural networks can be non-convex, then why do we use it?

Question

I saw a thread here (Cost function of neural network is non-convex?). After I read this, I am really confused.

I am wondering that if the cost function is not convex, and we do backpropagation, then can it be "well-backpropagated?"

Because it is not convex, we cannot optimize a model.

My hypothesis was wrong? I want to know this...

Dikran Marsupial · Accepted Answer · 2022-01-09T11:26:14.083

"Because it is not convex, we cannot optimize a model."

It is not true that you can't optimize a non-convex function, it just means there is no guarantee of a unique global minimum and that there may be local minima. This isn't actually as much of a problem as often we don't want to find the global minimum (especially for an unregularised neural network) as it would almost certainly horribly overfit the data. It is possible that these local minima are actually beneficial.

For a multi-layer perceptron type network, you will get multiple local minima just by permuting the hidden layer neurons, and by negating the input-to-hidden and hidden-to-output weights (for a network with one hidden layer). These modifications do not change the function of the network but give different weight matrices, demonstrating that these local minima are functionally equivalent and therefore not problematic.

In practice, there are also likely to be local minima that are not functionally equivalent. The solution to this is normally to train the MLP several times, starting from different random initialisations of the weights, and retain the network that gives the best performance (e.g. using a validation set).

Note that convexity is often given as an advantage of kernel learning methods, such as the support vector machine. However these methods tend to have kernel and regularisation hyper-parameters that are often optimised by cross-validation. The cross-validation loss that is being optimised is unlikely to be a convex function of the hyper-parameters, so kernel learning method are not truly convex either.

Okay, if it is fine that it is not a convex loss function, then why cost function is important to be a convex? Or why do we study on convexity of loss function? If cost function is a convex function, then it has some advantages? That's all? — JAEMTO, Jan 10 '22 at 06:29
If cost function is a convex function, then it has some advantages? That's all? — JAEMTO, Jan 10 '22 at 06:29
Yes, mostly the guarantee of a single global optimum, but also likely to be better convergence and the possibility of more efficient algorithms. However there is a downside as well, which is that the models may be less flexible or more computationally expensive to achieve equal flexibility. — Dikran Marsupial, Jan 10 '22 at 09:43

Cost function of neural networks can be non-convex, then why do we use it?

1 Answers1