Is it better to have a convex or non-convex loss using Adam/Nadam optimization

Question

I am trying to build my own convex loss function (an adaption of Huber function, composite function with MSE and a smooth MAE term) and a professor told me because convexity only impacts the final step of the Adam algorithm, then convexity of my function makes almost no difference and is irrelevant.

Is this true? I feel it surely will still help in finding the global minima quicker because of momentum?

Maybe better tell us about the loss function and why do you want to use it instead of out-of-the-box loss? — Tim, Feb 17 '22 at 07:02
What about it? The question remains the same - does using a convex function (Huber) make a difference compared to non-convex functions when using Adam? — CCZ23, Feb 17 '22 at 15:16
Optimizing a neural network is a non-convex problem, even if the loss is convex in terms of the predictions and targets. https://stats.stackexchange.com/questions/281240/why-is-the-cost-function-of-neural-networks-non-convex — Sycorax, Feb 19 '22 at 23:23

score 1 · Answer 1 · answered Feb 19 '22 at 23:12

In general (of course there may be exceptions), convex functions are easier to optimize than nonconvex functions. But this question is kind of like asking whether it's better to multiply two 3 digit numbers together, or two 6 digit numbers together, when using an abacus. One is easier than the other, but 3 digits is more limiting than 10 digits, and sometimes you may need to use all 6. People choose to solve harder optimization problems because they think it'll perform better some ways -- for huber loss, this might be robustness to outliers.

Also, maybe you're aware of this already, but even if the loss is convex with respect to your predictions, that doesn't necessarily mean it's convex with respect to the parameters you're optimizing. For example, a neural network with a hidden layer and MSE as an objective has a nonconvex loss.

Momentum in optimizers can be beneficial in both convex and nonconvex cases.

So this is my point, you say convex functions are easier to optimise, but then due to parameters and weights, the NN will make it non-convex. So why do we care about convex functions in NNs if they lose their convexity property anyway? — CCZ23, Feb 21 '22 at 04:59

Is it better to have a convex or non-convex loss using Adam/Nadam optimization

1 Answers1