I am reading Chris Burge's paper about LambdaRank, LambdaMART for learning to rank. We only need to compute the lambda, which is relevant to gradients, and use it to update model parameters, no need to know cost functions. LambdaMART turns the optimization using gradient boosting machines (GBM).
My question is, if we can implement GBM to all machine learning problems, as long as you have gradients? The post Gradient in Gradient Boosting has explained it in regression problems: the prediction target for this new tree is the gradient of it loss function. For regression problem, cost function is $C = (y − \hat{y})^2$, and the sequential regression trees fit: $z = y − \hat{y} = -\frac{\partial C}{\partial \hat{y}}$.
But the loss function in LambdaRank is not that simple (it optimizes NDCG). Can we have a generic and simple picture to understand how gradient boosting works here?