Model performance in quantile modelling

Question

I am using quantile regression (for example via gbm or quantreg in R) - not focusing on the median but instead an upper quantile (e.g. 75th). Coming from a predictive modeling background, I want to measure how well the model fits on a test set and be able to describe this to a business user. My question is how? In a typical setting with a continuous target I could do the following:

Calculate the overall RMSE
Decile the data set by the predicted value and compare the average actual to the average predicted in each decile.
Etc.

What can be done in this case, where there really is no actual value (i don't think at least) to compare the prediction to?

Here is an example code:

install.packages("quantreg")
library(quantreg)

install.packages("gbm")
library(gbm)

data("barro")

trainIndx<-sample(1:nrow(barro),size=round(nrow(barro)*0.7),replace=FALSE)
train<-barro[trainIndx,]
valid<-barro[-trainIndx,]

modGBM<-gbm(y.net~., # formula
            data=train, # dataset
            distribution=list(name="quantile",alpha=0.75), # see the help for other choices
            n.trees=5000, # number of trees
            shrinkage=0.005, # shrinkage or learning rate,
            # 0.001 to 0.1 usually work
            interaction.depth=5, # 1: additive model, 2: two-way interactions, etc.
            bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best
            train.fraction = 0.5, # fraction of data for training,
            # first train.fraction*N used for training
            n.minobsinnode = 10, # minimum total weight needed in each node
            cv.folds = 5, # do 3-fold cross-validation
            keep.data=TRUE, # keep a copy of the dataset with the object
            verbose=TRUE) # don’t print out progress

best.iter<-gbm.perf(modGBM,method="cv")

pred<-predict(modGBM,valid,best.iter)

Now what - since we don't observe the percentile of the conditional distribution?

Add:

I hypothesized several methods and I would like to know if they are correct and if there are better ones - also how to interpret the first:

Calculate the average value from the loss functions:
```
qregLoss<-function(actual, estimate,quantile)
{
   (sum((actual-estimate)*(quantile-((actual-estimate)<0))))/length(actual)

}
```
This is the loss function for quantile regression - but how do we interpret the value?
Should we expect that if for example we are calculating the 75th percentile that on a test set, the predicted value should be greater than the actual value around 75% of the time?

Are there other methods formal or heuristic to describe how well the model predicts new cases?

Section 3 in [this paper](http://econ.ucsd.edu/~ikomunje/files/papers/hdbk_quantile.pdf) might be useful. — tchakravarty, Mar 23 '13 at 19:01

score 3 · Accepted Answer · answered Sep 11 '13 at 15:54

3

A useful reference may be Haupt, Kagerer, and Schnurbus (2011) discussing the use of quantile-specific measures of predictive accuracy based on cross-validations for various classes of quantile regression models.

answered Sep 11 '13 at 15:54

Skullduggery

166
2
4

score 0 · Answer 2 · answered Aug 13 '19 at 10:43

I would use the pinball loss (defined on the start of the second page of https://arxiv.org/pdf/1102.2101.pdf) and interpret it as the mean absolute error (MAE) for the quantile you are modelling, for example, let's say for an error of 100: "The mean absolute error of our model regarding the real 75%-quantile in our test data is 100."

Keep in mind this is not comparable to the RMSE as outliers are a lot less influential.

To answer your question (2): If you model the 75% quantile, you will fit the border splitting the data mass !pointwise! to a ratio of 75:25. Then approximately 25% of your test data should lie above your prediction.

Model performance in quantile modelling

2 Answers2