How the Hessian matrix is used in optimization if you can't invert it

Question

I've seen quite a lot of work to do with approximating the Hessian such as the Hessian Vector Product but I'm not entirely sure how knowing the Hessian helps us evaluate the gradient step to take.

Newton's method utilizes the inverse Hessian such that

$$ f(\mathbf{x + \Delta x}) \approx f(\mathbf{x}) + \mathbf{g}^T \mathbf{\Delta x} + \frac{1}{2}\mathbf{\Delta x^T H \Delta x} $$

so if we want to solve for when the gradient is zero,

$$ \frac{d f(\mathbf{x} + \mathbf{\Delta x)}}{d \mathbf{\Delta x}} = \mathbf{g + H \Delta x} $$

$$ 0 = \mathbf{g} + \mathbf{H} \mathbf{\Delta x} $$

then $$ \Delta \mathbf{x} = - \mathbf{H}^{-1} \mathbf{g} $$

where $\mathbf{g}$ is the gradient of $f$ and $\mathbf{H}$ is the Hessian.

but, isn't the main difficulty to do with the amount of computation required to invert the Hessian?

Specifically in the Hessian-Vector Product they use the following trick: $$ {\bf g}({\bf x}+{\bf \Delta x}) \approx {\bf g}({\bf x}) + \mathbf{H}({\bf x}){\bf \Delta x}$$

then for small $r$ $$ {\bf g}({\bf x}+r{\bf v}) \approx {\bf g}({\bf x}) + r \mathbf{H}({\bf x}){\bf v}$$

and this lets them compute $\mathbf{Hv}$ $$\mathbf{H}({\bf x}){\bf v}\approx\frac{{\bf g}({\bf x}+r{\bf v}) - {\bf g}({\bf x})}{r}$$

But... again, if what's important is the inverse Hessian, then what use is $\mathbf{H}v$ assuming it's too computationally expensive to invert?

With reference to your last sentence - note that you aren't inverting $Hv$, you are using it directly. That is the point of the Hessian-vector product; you are avoiding calculating $H^{-1}$. It is an approximation, however. https://justindomke.wordpress.com/2009/01/17/hessian-vector-products/ spells this out as well. — jbowman, Sep 24 '18 at 19:27
@jbowman Hm, I checked but it doesn't seem to state how it's used? I tried to follow the link to Stochastic Meta Descent but the pdf was forbidden. It's not able to be used directly in Newton's method or SGD is it? — tryingtolearn, Sep 24 '18 at 19:35
No, no, the point is to approximate a Newton step w/o having to calculate the Hessian and inverse thereof, not find a clever way of actually doing a Newton step w/o using the Hessian. This link might help out more: http://andrew.gibiansky.com/blog/machine-learning/hessian-free-optimization/ especially the section on "Hessian-free optimzation." — jbowman, Sep 24 '18 at 19:44
@jbowman Sorry, I'm not entirely sure I understand what you mean still. Could you expand on your first sentence? In the blog he just seems to go through the derivation I wrote in my post. How is $v$ chosen and what's it's relationship with the conjugate gradient algorithm that is being discussed in the blog? — tryingtolearn, Sep 24 '18 at 20:01
$Hv$ is useful for iterative solvers that don't require explicit resolution of the Hessian but only products with arbitrary vectors. This is because we actually don't need to inverse; that is simply one means to the solution to the linear system. Note CG is one such iterative solver for SPD matrices specifically. — deasmhumnha, Sep 25 '18 at 02:52
@DezmondGoff Right, but I still don't see where it's being used in the CG method. The author never mentions the Hessian when he's running through the algorithm and all calculations of $\alpha$ and $\beta$ involve calculating over some inverse A matrix. I feel like I'm missing something \textit{very} obvious here... — tryingtolearn, Sep 25 '18 at 10:51
$A$ can be any (symmetric) positive definite matrix, including our unknown hessian. — deasmhumnha, Sep 25 '18 at 11:23
@DezmondGoff Right, but to calculate both $\alpha$ and $\beta$ you need the inverse of $d^T A d$ according to the equations given so you still need to invert something don't you? — tryingtolearn, Sep 25 '18 at 12:12
.... of course. Darn, thank you. I take it that you can then approximate both $Ax$ and $Ad$ using this simply trick and then solve accordingly. Thank you for your insight. — tryingtolearn, Sep 25 '18 at 12:20

How the Hessian matrix is used in optimization if you can't invert it

0 Answers0