From the literature I read that
For a neural network, the cost function, J(W,b) is a non-convex function, gradient descent is susceptible to local optima; however, in practice gradient descent usually works fairly well.
But on the other hand, support vector machine which is another popular classifier involves optimizing convex functions.
How is that solving SVM is a convex optimization problem and for feed forward neural network it is non-convex optimization?
Edit: Cost function of neural network is non-convex?
This link provides some information regarding the intuition behind the optimization in neural networks. But it is not clear there. Also my question was in the reason behind the difference in the optimization approaches in ANNs and SVMs. Both involves a cost function involves a sum of squared errors and a regularization term. Why is one convex optimization problem (SVM) and the other optimizing non-convex functions.