Highest Voted 'gradient' Questions - Statistical Analysis Stack Exchange

44

votes

3 answers

Gradient Boosting for Linear Regression - why does it not work?

While learning about Gradient Boosting, I haven't heard about any constraints regarding the properties of a "weak classifier" that the method uses to build and ensemble model. However, I could not imagine an application of a GB that uses linear…

asked Dec 16 '15 at 00:41

Matek

749
1
6
14

18

votes

1 answer

Is gradient boosting appropriate for data with low event rates like 1%?

I am trying gradient boosting on a dataset with event rate about 1% using Enterprise miner, but it is failing to produce any output. My question is, since it a decision tree based approach, is it even right to use gradient boosting with such low…

boosting unbalanced-classes rare-events gradient

asked Feb 29 '16 at 14:03

user2542275

717
2
6
17

18

votes

2 answers

How to use XGboost.cv with hyperparameters optimization?

I want to optimize hyperparameters of XGboost using crossvalidation. However, it is not clear how to obtain the model from xgb.cv. For instance I call objective(params) from fmin. Then model is fitted on dtrain and validated on dvalid. What if I…

cross-validation python boosting hyperparameter gradient

asked Nov 28 '15 at 10:42

Klausos

499
1
6
11

12

votes

3 answers

Gradient descent on non-convex functions

What situations do we know of where gradient descent can be shown to converge (either to a critical point or to a local/global minima) for non-convex functions? For SGD on non-convex functions, one kind of proof has been reviewed here,…

gradient-descent gradient stochastic-gradient-descent non-convex

asked Feb 07 '18 at 02:50

gradstudent

271
2
9

11

votes

3 answers

what is vanishing gradient?

I have seen the word "vanishing gradient" many times in deep learning literature. what is that? gradient respect to what variable? input variable or hidden units? Does that mean the gradient vector is all zero? Or the optimization stuck in local…

machine-learning neural-networks deep-learning gradient

asked Sep 04 '17 at 06:52

Haitao Du

32,885
17
118
213

11

votes

2 answers

Name for outer product of gradient approximation of Hessian

Is there a name for approximating the Hessian as the outer product of the gradient with itself? If one is approximating the Hessian of the log-loss, then the outer product of the gradient with itself is the Fisher information matrix. What about in…

terminology gradient hessian

asked Oct 31 '14 at 05:57

Neil G

13,633
3
41
84

10

votes

1 answer

How to compute the gradient and hessian of logarithmic loss? (question is based on a numpy example script from xgboost's github repository)

I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script. I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the…

entropy loss-functions derivative gradient boosting

asked Aug 23 '16 at 06:15

Greg

335
1
4
9

9

votes

1 answer

Can I combine many gradient boosting trees using bagging technique

Based on Gradient Boosting Tree vs Random Forest . GBDT and RF using different strategy to tackle bias and variance. My question is that can I resample dataset (with replacement) to train multiple GBDT and combine their predictions as the final…

random-forest cart boosting bagging gradient

asked Aug 05 '18 at 14:37

MC LIN

91
1
3

9

votes

1 answer

Regression with zero inflated continuous response variable using gradient boosting trees and random forest

I have a data set with a lot of 0 values for the continuous response variable (about 50%). I want to understand how well gradient boosting/random forest deals with this problem. My colleague suggested doing a two part model with classification as…

random-forest boosting gradient

asked Aug 19 '16 at 19:02

user1569341

253
3
5

9

votes

2 answers

Deriving gradient of a single layer neural network w.r.t its inputs, what is the operator in the chain rule?

Problem is: Derive the gradient with respect to the input layer for a a single hidden layer neural network using sigmoid for input -> hidden, softmax for hidden -> output, with a cross entropy loss. I can get through most of the derivation…

neural-networks gradient

asked Feb 04 '16 at 19:28

amatsukawa

191
1
2

8

votes

1 answer

Bagging of xgboost

The extreme-gradient boosting algorithm seems to be widely applied these days. I often have the feeling that boosted models tend to overfit. I know that there are parameters in the algorithm to prevent this. Sticking to the documentation here the…

machine-learning boosting bagging gradient

asked Mar 25 '16 at 15:04

Richi W

3,216
3
30
53

8

votes

3 answers

Numeric Gradient Checking: How close is close enough?

I made a convolutional neural network and I wanted to check that my gradients are being calculated correctly using numeric gradient checking. The question is, how close is close enough? My checking function just spits out the calculated derivative,…

neural-networks conv-neural-network gradient

asked Nov 30 '15 at 08:55

Frobot

1,751
1
13
21

7

votes

2 answers

In GD-optimisation, if the gradient of the error function is w.r.t to the weights, isn't the target value dropped since it's a lone constant?

Suppose we have the absolute difference as an error function: $\mathit{loss}(w) = |m_x(w) - t|$ where $m_x$ is simply some model with input $x$ and weight setting $w$, and $t$ is the target value. In gradient-descent optimisation, the initial idea…

neural-networks gradient-descent gradient inverse-problem automatic-differentiation

asked Nov 11 '21 at 00:45

mesllo

579
2
16

7

votes

1 answer

Is stochastic gradient descent biased?

In the paper Mutual Information Neural Estimation, the authors derive the following gradient for the network $$ \nabla_\theta\mathcal V(\theta)=\mathbb E\left[\nabla_\theta T_\theta\right]-{\mathbb E\left[e^{T_\theta}\nabla_\theta…

bias mutual-information gradient stochastic-gradient-descent

asked Aug 28 '18 at 08:21

Maybe

775
7
15

7

votes

1 answer

gradient descent and local maximum

I read that gradient descent converge always to a local minimum while other methods as Newton's method this is not guaranteed (if the Hessian is not definite positive); but if the start point in GD is unfortunately a local maximum (and then the…

machine-learning optimization gradient-descent gradient hessian

asked Aug 13 '18 at 11:58

volperossa

625
5
9

Questions tagged [gradient]