Vector pointing in the direction where a function is growing fastest; its components are partial derivatives of this function. For questions about gradients in ecology, please use the [ecology] tag instead.
Questions tagged [gradient]
186 questions
44
votes
3 answers
Gradient Boosting for Linear Regression - why does it not work?
While learning about Gradient Boosting, I haven't heard about any constraints regarding the properties of a "weak classifier" that the method uses to build and ensemble model. However, I could not imagine an application of a GB that uses linear…

Matek
- 749
- 1
- 6
- 14
18
votes
1 answer
Is gradient boosting appropriate for data with low event rates like 1%?
I am trying gradient boosting on a dataset with event rate about 1% using Enterprise miner, but it is failing to produce any output. My question is, since it a decision tree based approach, is it even right to use gradient boosting with such low…

user2542275
- 717
- 2
- 6
- 17
18
votes
2 answers
How to use XGboost.cv with hyperparameters optimization?
I want to optimize hyperparameters of XGboost using crossvalidation. However, it is not clear how to obtain the model from xgb.cv.
For instance I call objective(params) from fmin. Then model is fitted on dtrain and validated on dvalid. What if I…

Klausos
- 499
- 1
- 6
- 11
12
votes
3 answers
Gradient descent on non-convex functions
What situations do we know of where gradient descent can be shown to converge (either to a critical point or to a local/global minima) for non-convex functions?
For SGD on non-convex functions, one kind of proof has been reviewed here,…

gradstudent
- 271
- 2
- 9
11
votes
3 answers
what is vanishing gradient?
I have seen the word "vanishing gradient" many times in deep learning literature. what is that? gradient respect to what variable? input variable or hidden units?
Does that mean the gradient vector is all zero? Or the optimization stuck in local…

Haitao Du
- 32,885
- 17
- 118
- 213
11
votes
2 answers
Name for outer product of gradient approximation of Hessian
Is there a name for approximating the Hessian as the outer product of the gradient with itself?
If one is approximating the Hessian of the log-loss, then the outer product of the gradient with itself is the Fisher information matrix. What about in…

Neil G
- 13,633
- 3
- 41
- 84
10
votes
1 answer
How to compute the gradient and hessian of logarithmic loss? (question is based on a numpy example script from xgboost's github repository)
I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script.
I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the…

Greg
- 335
- 1
- 4
- 9
9
votes
1 answer
Can I combine many gradient boosting trees using bagging technique
Based on Gradient Boosting Tree vs Random Forest . GBDT and RF using different strategy to tackle bias and variance.
My question is that can I resample dataset (with replacement) to train multiple GBDT and combine their predictions as the final…

MC LIN
- 91
- 1
- 3
9
votes
1 answer
Regression with zero inflated continuous response variable using gradient boosting trees and random forest
I have a data set with a lot of 0 values for the continuous response variable (about 50%). I want to understand how well gradient boosting/random forest deals with this problem. My colleague suggested doing a two part model with classification as…

user1569341
- 253
- 3
- 5
9
votes
2 answers
Deriving gradient of a single layer neural network w.r.t its inputs, what is the operator in the chain rule?
Problem is:
Derive the gradient with respect to the input layer for a a single
hidden layer neural network using sigmoid for input -> hidden, softmax
for hidden -> output, with a cross entropy loss.
I can get through most of the derivation…

amatsukawa
- 191
- 1
- 2
8
votes
1 answer
Bagging of xgboost
The extreme-gradient boosting algorithm seems to be widely applied these days. I often have the feeling that boosted models tend to overfit. I know that there are parameters in the algorithm to prevent this. Sticking to the documentation here the…

Richi W
- 3,216
- 3
- 30
- 53
8
votes
3 answers
Numeric Gradient Checking: How close is close enough?
I made a convolutional neural network and I wanted to check that my gradients are being calculated correctly using numeric gradient checking.
The question is, how close is close enough?
My checking function just spits out the calculated derivative,…

Frobot
- 1,751
- 1
- 13
- 21
7
votes
2 answers
In GD-optimisation, if the gradient of the error function is w.r.t to the weights, isn't the target value dropped since it's a lone constant?
Suppose we have the absolute difference as an error function:
$\mathit{loss}(w) = |m_x(w) - t|$
where $m_x$ is simply some model with input $x$ and weight setting $w$, and $t$ is the target value.
In gradient-descent optimisation, the initial idea…

mesllo
- 579
- 2
- 16
7
votes
1 answer
Is stochastic gradient descent biased?
In the paper Mutual Information Neural Estimation, the authors derive the following gradient for the network
$$
\nabla_\theta\mathcal V(\theta)=\mathbb E\left[\nabla_\theta T_\theta\right]-{\mathbb E\left[e^{T_\theta}\nabla_\theta…

Maybe
- 775
- 7
- 15
7
votes
1 answer
gradient descent and local maximum
I read that gradient descent converge always to a local minimum while other methods as Newton's method this is not guaranteed (if the Hessian is not definite positive); but if the start point in GD is unfortunately a local maximum (and then the…

volperossa
- 625
- 5
- 9