Saddle Points are said to be obstacles that prevent an Optimization Algorithm from reaching their intended target (i.e. the global minimum).
Gradient Based Optimization Methods repeatedly evaluate the first derivatives of the function they are trying to optimize as they progress towards their intended. Newtown Based Methods repeatedly evaluate the first derivatives along with the second derivatives.
Evaluating the second derivatives makes the optimization process slower, but is also said to have stronger convergence results. As a matter of fact, I have heard that second derivative evaluation is becomes so cumbersome in higher dimensions, that optimization based on second order information is impractical in higher dimensions. This is why gradient based algorithms are considered as the "go to choice" in machine learning applications.
My Question: Are Newton Method more likely to get caught in saddle points compared to gradient descent? Or are they both equally likely to get stuck in Saddle Points?
Thanks!
Note: if they are both equally likely to get stuck in Saddle Points, then they are both equally disadvantaged. But since gradient descent only requires computing a single derivative, gradient descent is more advantageous in this regard.