4

I am currently working through an old textbook Practical Optimization by Gill, Murray and Wright (c 1982) who make some derivations which seem correct, but I am unable to duplicate. In the equations below $g$ and $p$ are both vectors. $g$ is the gradient from the current iterate of a optimization procedure and $p$ is the search direction we wish to solve for.

The authors inform the reader that the minimum (over $p$) of

$\Large\frac{g'p}{||p||}$ is $p=-g$.

Alternatively, we might form a different norm for $p$ by considering a symmetric positive definite matrix $C$ in which case the minimizer of

$\Large\frac{g'p}{(p'Cp)^{1/2}}$ is $p=-C^{-1}g$.

I am having trouble deriving these statements.

My question is: Does it take more than a derivative and setting equal to zero to prove this? If so, how do I derive these formulas?

This question is similar to

Gradient descent with constraints

but not identical. This other question deals with solutions that are unit length, not necessarily step directions which are unit length.

Setjmp
  • 141
  • 5

2 Answers2

3

This is a simple application of Cauchy-Schwarz.

$(g^Tp)^2 \leq \|g\|^2 \|p\|^2.$

So, $-\|g\| \leq \frac{g^T p}{\|p\|} \leq \|g\| $ and the lower bound is attained when $ p = -g $.

Similarly, $ (g^T p)^2 = \langle C^{-1/2}g , C^{1/2} p\rangle^2 \leq (g^T C^{-1} g) \times (p^T C p) $ and the lower bound is similarly attained.

Arin Chaudhuri
  • 6,189
  • 2
  • 26
  • 43
2

I prefer to look at the problem (which ends up being equivalent) of solving $\min \{ \langle g, h \rangle | \|h\| \leq 1 \}$. The norm is the standard $2$-norm. The reason I prefer this formulation is that it is closer in form to the original problem from which the search direction problem is derived. Also, it has a trust region flavor, which I personally find more appealing than a search direction/step length approach.

For the first problem, the Cauchy Bunyakovsky Schwarz Beiber inequality gives $|\langle g, h \rangle| \leq \|g\| \|h\|$, hence it is easy to see that the minimum is $\min \{ \langle g, h \rangle \,|\, \|h\| \leq 1 \} = - \|g\|$, and the minimizer is (it is unique with the $2$-norm) $h = -\frac{1}{\|g\|} g$. You could also derive this by noting that (if $g\neq 0$), the constraint $\|h\| \leq 1$ must be active, and use Lagrange multipliers to obtain the same result.

For the second, let $A^TA = C$ be the Cholesky decomposition of $C$. The problem is now $\min \{ \langle g, h \rangle \,|\, \|Ah\| \leq 1 \} = \min \{ \langle g, h \rangle \,|\, \|\delta\| \leq 1, \, h= A^{-1}\delta \} $. This is equivalent to $\min \{ \langle g, A^{-1}\delta \rangle \,|\, \|\delta\| \leq 1 \} = \min \{ \langle (A^{-1})^T g, \delta \rangle \,|\, \|\delta\| \leq 1 \}$, where $h = A^{-1}\delta$, and the first problem shows that the minimum is $-\|(A^{-1})^T g\|$ and the minimizer is $\delta = -\frac{1}{\|(A^{-1})^T g\|} (A^{-1})^Tg$, or in terms of the original problem $h = A^{-1}\delta = -\frac{1}{\|(A^{-1})^T g\|} A^{-1}(A^{-1})^Tg$. Since $A^TA = C$, we have $\|(A^{-1})^T g\| = \sqrt{\langle (A^{-1})^T g, (A^{-1})^T g \rangle} = \sqrt{\langle g, C^{-1} g \rangle}$, and $h = -\frac{1}{\sqrt{\langle g, C^{-1} g \rangle}} C^{-1} g$.

Daniel Fischer
  • 203,207
  • 18
  • 266
  • 396
copper.hat
  • 166,869
  • 9
  • 103
  • 245