Questions tagged [gradient]

Vector pointing in the direction where a function is growing fastest; its components are partial derivatives of this function. For questions about gradients in ecology, please use the [ecology] tag instead.

186 questions
44
votes
3 answers

Gradient Boosting for Linear Regression - why does it not work?

While learning about Gradient Boosting, I haven't heard about any constraints regarding the properties of a "weak classifier" that the method uses to build and ensemble model. However, I could not imagine an application of a GB that uses linear…
18
votes
1 answer

Is gradient boosting appropriate for data with low event rates like 1%?

I am trying gradient boosting on a dataset with event rate about 1% using Enterprise miner, but it is failing to produce any output. My question is, since it a decision tree based approach, is it even right to use gradient boosting with such low…
user2542275
  • 717
  • 2
  • 6
  • 17
18
votes
2 answers

How to use XGboost.cv with hyperparameters optimization?

I want to optimize hyperparameters of XGboost using crossvalidation. However, it is not clear how to obtain the model from xgb.cv. For instance I call objective(params) from fmin. Then model is fitted on dtrain and validated on dvalid. What if I…
Klausos
  • 499
  • 1
  • 6
  • 11
12
votes
3 answers

Gradient descent on non-convex functions

What situations do we know of where gradient descent can be shown to converge (either to a critical point or to a local/global minima) for non-convex functions? For SGD on non-convex functions, one kind of proof has been reviewed here,…
11
votes
3 answers

what is vanishing gradient?

I have seen the word "vanishing gradient" many times in deep learning literature. what is that? gradient respect to what variable? input variable or hidden units? Does that mean the gradient vector is all zero? Or the optimization stuck in local…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
11
votes
2 answers

Name for outer product of gradient approximation of Hessian

Is there a name for approximating the Hessian as the outer product of the gradient with itself? If one is approximating the Hessian of the log-loss, then the outer product of the gradient with itself is the Fisher information matrix. What about in…
Neil G
  • 13,633
  • 3
  • 41
  • 84
10
votes
1 answer

How to compute the gradient and hessian of logarithmic loss? (question is based on a numpy example script from xgboost's github repository)

I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script. I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the…
Greg
  • 335
  • 1
  • 4
  • 9
9
votes
1 answer

Can I combine many gradient boosting trees using bagging technique

Based on Gradient Boosting Tree vs Random Forest . GBDT and RF using different strategy to tackle bias and variance. My question is that can I resample dataset (with replacement) to train multiple GBDT and combine their predictions as the final…
MC LIN
  • 91
  • 1
  • 3
9
votes
1 answer

Regression with zero inflated continuous response variable using gradient boosting trees and random forest

I have a data set with a lot of 0 values for the continuous response variable (about 50%). I want to understand how well gradient boosting/random forest deals with this problem. My colleague suggested doing a two part model with classification as…
user1569341
  • 253
  • 3
  • 5
9
votes
2 answers

Deriving gradient of a single layer neural network w.r.t its inputs, what is the operator in the chain rule?

Problem is: Derive the gradient with respect to the input layer for a a single hidden layer neural network using sigmoid for input -> hidden, softmax for hidden -> output, with a cross entropy loss. I can get through most of the derivation…
amatsukawa
  • 191
  • 1
  • 2
8
votes
1 answer

Bagging of xgboost

The extreme-gradient boosting algorithm seems to be widely applied these days. I often have the feeling that boosted models tend to overfit. I know that there are parameters in the algorithm to prevent this. Sticking to the documentation here the…
Richi W
  • 3,216
  • 3
  • 30
  • 53
8
votes
3 answers

Numeric Gradient Checking: How close is close enough?

I made a convolutional neural network and I wanted to check that my gradients are being calculated correctly using numeric gradient checking. The question is, how close is close enough? My checking function just spits out the calculated derivative,…
Frobot
  • 1,751
  • 1
  • 13
  • 21
7
votes
2 answers

In GD-optimisation, if the gradient of the error function is w.r.t to the weights, isn't the target value dropped since it's a lone constant?

Suppose we have the absolute difference as an error function: $\mathit{loss}(w) = |m_x(w) - t|$ where $m_x$ is simply some model with input $x$ and weight setting $w$, and $t$ is the target value. In gradient-descent optimisation, the initial idea…
7
votes
1 answer

Is stochastic gradient descent biased?

In the paper Mutual Information Neural Estimation, the authors derive the following gradient for the network $$ \nabla_\theta\mathcal V(\theta)=\mathbb E\left[\nabla_\theta T_\theta\right]-{\mathbb E\left[e^{T_\theta}\nabla_\theta…
7
votes
1 answer

gradient descent and local maximum

I read that gradient descent converge always to a local minimum while other methods as Newton's method this is not guaranteed (if the Hessian is not definite positive); but if the start point in GD is unfortunately a local maximum (and then the…
1
2 3
12 13