0

Why are convex loss functions so important in Neural Networks? Because Neural networks are learnt end-to-end, with non-linear activations, causing convex loss functions to actually become non-convex problems.

So my issue is when using SGD optimizer, this requires a convex loss function, but if a Neural Network makes it non-convex, there will be multiple local optimas now, so why can we not just use non-convex functions instead to begin with? If it is not a convex problem, why does SGD work just for convex functions?

I feel that it is to do with the Hessian matrix. For example, am i right in saying that if there are more weights, the less likely it is that the Hessian is positive semidefinite, meaning most of the critical points of local optima will be saddle nodes which SGD can escape. Whereas, if it were non-convex, it is likely to be a positive semidefinite Hessian, meaning it will take longer to converge?

CCZ23
  • 252
  • 1
  • 11
  • It's hard to understand what problem you're trying to solve or you want to learn. Can you [edit] to clarify what you're doing and how optimization of convex functions fits into solving it? What does it mean for a function to be pointless? – Sycorax Feb 18 '22 at 03:02
  • Does it make sense now @Sycorax? – CCZ23 Feb 18 '22 at 03:12
  • I think you're asking why a NN is non-convex even if it has a convex loss function, but that's meeting you more than half-way. If the duplicate thread isn't what you're asking about, then you'll need to [edit] to clarify -- the question reads as if you were reading a specific article and you want to ask about a specific paragraph, but you haven't shared that information in the question. – Sycorax Feb 18 '22 at 03:26
  • The post is somewhat helpful, I still want to know why SGD requires convex functions though when they are not even convex in a NN. This post implies that convex functions are useful because the Hessian is less likely to be positive semi-definite, is this right? I edited my post for this. – CCZ23 Feb 18 '22 at 03:33
  • "Require" for what? – Sycorax Feb 18 '22 at 03:33
  • I thought SGD does not work with non-convex losses, right? Or if not, the results are very very poor. If so, why? – CCZ23 Feb 18 '22 at 03:35
  • I've added another duplicate thread which appears to be a better match for what you're after, in light of the edits & comments. – Sycorax Feb 18 '22 at 03:39

0 Answers0