3

I have a question on second derivative test for most "modern" machine learning algorithms. I learned that in calculus but never seen it in real applications. Most machine learning algorithms optimizations seems to be trying to get the parameters that make derivative to $0$, without further tests to see if it is a max / min or saddle point. Why? Is that because many objective function is not twice differentiable?

I know saddle point is a problem in neural network optimization, but even for simple case, say ridge regression / logistic regression, we also do not do the second derivative test, why? Is that because we know the objective function is convex?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
  • 2
    In MLP optimization, SGD-like techniques are used. You move down the gradient of the loss, so you explicitly know you won't get stuck in local maxima, but you can get stuck in saddle-points (less likely with momentum and adaptive techniques so common these days) and local minima. – Firebug May 03 '18 at 14:03
  • 3
    Logistic regression is a convex optimization problem (http://mathgotchas.blogspot.com/2011/10/why-is-error-function-minimized-in.html). Ridge regression is also convex or can be solved as a system of equations. – khol May 03 '18 at 14:05
  • @Firebug thanks for the comment. Do you mean that in MLP, many cases we even cannot get converge / gradient 0, so no need to perform further test? – Haitao Du May 03 '18 at 14:05
  • @khol I know it can be solved using direct or iterative algorithm (logistic regression needs iterative algorithm), but why no further test? because it is convex? BTW, thanks for the link. – Haitao Du May 03 '18 at 14:06
  • Related question ["Is it important to have Hessian positive definite for trust region method optimization?"](https://stats.stackexchange.com/questions/344221/is-it-important-to-have-hessian-positive-definite#344221). – Richard Hardy May 03 '18 at 14:37
  • It's not computationally feasible to do the second derivative again and again. – sww May 03 '18 at 15:01

0 Answers0