I've seen quite a lot of work to do with approximating the Hessian such as the Hessian Vector Product but I'm not entirely sure how knowing the Hessian helps us evaluate the gradient step to take.
Newton's method utilizes the inverse Hessian such that
$$ f(\mathbf{x + \Delta x}) \approx f(\mathbf{x}) + \mathbf{g}^T \mathbf{\Delta x} + \frac{1}{2}\mathbf{\Delta x^T H \Delta x} $$
so if we want to solve for when the gradient is zero,
$$ \frac{d f(\mathbf{x} + \mathbf{\Delta x)}}{d \mathbf{\Delta x}} = \mathbf{g + H \Delta x} $$
$$ 0 = \mathbf{g} + \mathbf{H} \mathbf{\Delta x} $$
then $$ \Delta \mathbf{x} = - \mathbf{H}^{-1} \mathbf{g} $$
where $\mathbf{g}$ is the gradient of $f$ and $\mathbf{H}$ is the Hessian.
but, isn't the main difficulty to do with the amount of computation required to invert the Hessian?
Specifically in the Hessian-Vector Product they use the following trick: $$ {\bf g}({\bf x}+{\bf \Delta x}) \approx {\bf g}({\bf x}) + \mathbf{H}({\bf x}){\bf \Delta x}$$
then for small $r$ $$ {\bf g}({\bf x}+r{\bf v}) \approx {\bf g}({\bf x}) + r \mathbf{H}({\bf x}){\bf v}$$
and this lets them compute $\mathbf{Hv}$ $$\mathbf{H}({\bf x}){\bf v}\approx\frac{{\bf g}({\bf x}+r{\bf v}) - {\bf g}({\bf x})}{r}$$
But... again, if what's important is the inverse Hessian, then what use is $\mathbf{H}v$ assuming it's too computationally expensive to invert?