Highest Voted 'hessian' Questions - Statistical Analysis Stack Exchange

219

votes

9 answers

Why is Newton's method not widely used in machine learning?

This is something that has been bugging me for a while, and I couldn't find any satisfactory answers online, so here goes: After reviewing a set of lectures on convex optimization, Newton's method seems to be a far superior algorithm than gradient…

asked Dec 29 '16 at 01:00

Fei Yang

2,181
3
8
4

47

votes

1 answer

Explanation of min_child_weight in xgboost algorithm

The definition of the min_child_weight parameter in xgboost is given as the: minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than…

machine-learning boosting hessian

asked Dec 04 '17 at 16:39

User123456789

613
1
5
9

28

votes

6 answers

Why not use the third derivative for numerical optimization?

If Hessians are so good for optimization (see e.g. Newton's method), why stop there? Let's use the third, fourth, fifth, and sixth derivatives? Why not?

optimization gradient-descent hessian

asked Dec 22 '17 at 16:27

echo

823
7
13

11

votes

2 answers

Name for outer product of gradient approximation of Hessian

Is there a name for approximating the Hessian as the outer product of the gradient with itself? If one is approximating the Hessian of the log-loss, then the outer product of the gradient with itself is the Fisher information matrix. What about in…

terminology gradient hessian

asked Oct 31 '14 at 05:57

Neil G

13,633
3
41
84

7

votes

1 answer

gradient descent and local maximum

I read that gradient descent converge always to a local minimum while other methods as Newton's method this is not guaranteed (if the Hessian is not definite positive); but if the start point in GD is unfortunately a local maximum (and then the…

machine-learning optimization gradient-descent gradient hessian

asked Aug 13 '18 at 11:58

volperossa

625
5
9

7

votes

1 answer

Why is the Hessian of the log likelihood function in the logit model not negative semidefinite?

The Hessian of the log likelihood function is $$\frac{\partial^2 \ln(\beta \mid x)}{\partial \beta \partial \beta'} = -\sum_{i=1}^n…

logit hessian

asked May 26 '15 at 19:42

Fredrik P

436
3
12

6

votes

1 answer

Gradient and hessian of the MAPE

I want to use MAPE(Mean Absolute Percentage Error) as my loss function. def mape(y, y_pred): grad = <<<>>> hess = <<<>>> return grad, hess Can someone help me understand the hessian and gradient for MAPE as a loss function? We need to…

python gradient hessian mape

asked Nov 28 '17 at 13:55

Arc

235
2
6

6

votes

1 answer

Interpretation of eigenvectors of Hessian inverse

I'm reading a paper in which they use the eigenvectors of the inverse Hessian of a continuous probability distribution to characterize dimensions along which the distribution is most and least constrained. I'm having some trouble with the intuition…

distributions eigenvalues hessian

asked Apr 20 '15 at 12:47

Vivek Subramanian

2,613
2
19
34

5

votes

2 answers

What is a consequence of an ill-conditioned Hessian matrix?

In this publication I found an explanation of the Hessian matrix, along with what it means for it to be ill-conditioned. In the paper, there is this link given between the error surface and the eigenvalues of the Hessian matrix: The curvature of…

hessian

asked Feb 11 '19 at 16:08

kamilazdybal

672
8
20

5

votes

0 answers

How the Hessian matrix is used in optimization if you can't invert it

I've seen quite a lot of work to do with approximating the Hessian such as the Hessian Vector Product but I'm not entirely sure how knowing the Hessian helps us evaluate the gradient step to take. Newton's method utilizes the inverse Hessian such…

optimization gradient-descent hessian

asked Sep 24 '18 at 18:20

tryingtolearn

499
5
11

5

votes

3 answers

How does the second derivative inform an update step in Gradient Descent?

I was reading the deep learning book by Begnio, Goodfellow and Courville and there was one section where they explain the second derivative that I don't understand (section 4.31): The second derivative tells us how the first derivative will change…

neural-networks optimization deep-learning gradient-descent hessian

asked Nov 22 '17 at 23:35

Charlie Parker

5,836
11
57
113

5

votes

1 answer

Why does the determinant of the Hessian grow with n?

Context: I'm trying to understand BIC on a deeper level. I'm using BIC for model/structure selection for Bayesian networks. I'm confused because BIC is an approximation to the likelihood of a model, and the likelihood should never decrease when the…

bayesian bayesian-network bic determinant hessian

asked Jul 17 '17 at 15:51

Lizzie Silver

1,009
10
22

4

votes

1 answer

Parameter uncertainity in least squares optimization: rescaling Hessian

Given a least squares optimization problem of the form: $$ C(\lambda) = \sum_i ||y_i - f(x_i, \lambda)||^2$$ I have found in multiple questions/answers (e.g. here) that an estimate for the covariance of the parameters can be computed from the…

maximum-likelihood least-squares covariance uncertainty hessian

asked Jul 13 '21 at 17:00

Daniel Arteaga

91
1
4

4

votes

1 answer

Variance of maximum likelihood estimator in R

In different sources there is an algorithm how to calculate the variance of MLE in R. To keep it short: construct the negative log likelihood function. minimize it via nlm or optim with hessian=TRUE invert the Hessian and read out the diagonal…

variance maximum-likelihood fisher-information hessian

asked Jul 22 '16 at 11:36

holic

61
3

4

votes

1 answer

Why calculating standard error of an mle (and confidence intervals) from Hessian matrices?

I might not have fully understood these concepts, and I am confused about how standard error is calculated. Here are my understandings and confusions, let me know where went wrong. EDIT: I was taking about the hessian matrix output from R…

confidence-interval maximum-likelihood standard-error fisher-information hessian

asked Feb 10 '16 at 19:51

Yue Y

144
1
6

Questions tagged [hessian]