3

I have a case where I want to predict a time value in minutes.

This is the problem of regression.

I also want to predict the upper bound and lower bound.

I can do it two ways:

  1. Train 3 models: one for the main prediction, one for say a higher prediction and one for a lower prediction.

  2. Use Quantile regression whcih gives a lower and upper bound.

However, I am not understanding how Quantile regression works.

Here is the code:

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import GradientBoostingRegressor

np.random.seed(1)


#----------------------------------------------------------------------
#  First the noiseless case
X = np.atleast_2d(np.random.uniform(0, 10.0, size=100)).T
X = X.astype(np.float32)

# Observations
y = f(X).ravel()

dy = 1.5 + 1.0 * np.random.random(y.shape)
noise = np.random.normal(0, dy)
y += noise
y = y.astype(np.float32)

# Mesh the input space for evaluations of the real function, the prediction and
# its MSE
xx = np.atleast_2d(np.linspace(0, 10, 1000)).T
xx = xx.astype(np.float32)

alpha = 0.95

clf = GradientBoostingRegressor(loss='quantile', alpha=alpha,
                                n_estimators=250, max_depth=3,
                                learning_rate=.1, min_samples_leaf=9,
                                min_samples_split=9)

clf.fit(X, y)

# Make the prediction on the meshed x-axis
y_upper = clf.predict(xx)

clf.set_params(alpha=1.0 - alpha)
clf.fit(X, y)

# Make the prediction on the meshed x-axis
y_lower = clf.predict(xx)

clf.set_params(loss='ls')
clf.fit(X, y)

# Make the prediction on the meshed x-axis
y_pred = clf.predict(xx)

# Plot the function, the prediction and the 90% confidence interval based on
# the MSE
fig = plt.figure()
plt.plot(X, y, 'b.', markersize=10, label=u'Observations')
plt.plot(xx, y_pred, 'r-', label=u'Prediction') # pred
plt.plot(xx, y_upper, 'k-') # 
plt.plot(xx, y_lower, 'k-') # 
plt.fill(np.concatenate([xx, xx[::-1]]),
         np.concatenate([y_upper, y_lower[::-1]]),
         alpha=.5, fc='b', ec='None', label='90% prediction interval')
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.ylim(-10, 20)
plt.legend(loc='upper left')
plt.show()

My questions are:

  1. How does quantile regression work here i.e. how is the model trained?
  2. How to use a quantile regression mode at prediction time, does it give 3 predictions, what is y_lower and y_upper?
Rafael
  • 1,109
  • 1
  • 13
  • 30
  • 1
    Please elaborate on what you intend the "upper bound and lower bound" to represent: that will help us determine whether you even need quantile regression. – whuber Jun 24 '18 at 15:35
  • @whuber I am predicting Estimated Time of Arrival for consumers. I want to give them a range i.e. instead of saying your order will arrive in 74 hours, I will say your order will arrive between 68-78 hours. Quantile regression gives an upper bound and lower bound..from there I guessed it fits my problem..is any other algorithm possible too? – Rafael Jun 24 '18 at 15:39
  • You appear to be asking for a *prediction interval*. See https://stats.stackexchange.com/search?q=regression+%22prediction+interval%22. – whuber Jun 24 '18 at 15:41
  • @whuber yes, that 's where quantile regression is used, right? http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_quantile.html I see only one method for getting prediction interval..if quantile regression can be used for it, why not use it? It seems like a great method..howvere please take a look at the code..I am not getting how the intervals are getting predicted? – Rafael Jun 24 '18 at 15:43
  • 1
    That's a possible use of quantile regression. It's not necessarily an *appropriate* use, though: it depends on your statistical assumptions. But assuming that quantile regression is what you want to do, then it's unclear what you're trying to ask. Is your question "how do I do quantile regression" or would it be something more focused than that? – whuber Jun 24 '18 at 15:45
  • @whuber thanks, I want to understand how to implement it...yes it includes how the workflow will change when compared to linear regression..I have the code in scikit-learn ...but I am not getting how to apply it..if you could explain a bit that will be great! – Rafael Jun 24 '18 at 16:27
  • @Rafael I wonder if you tried RandomForestQuantileRegressor instead of GradientBoostingRegressor (in your code). – Cloud Cho Oct 23 '19 at 16:51

1 Answers1

0

To answer your questions:

How does quantile regression work here i.e. how is the model trained?

When creating the classifier, you've passed loss='quantile' along with alpha=0.95. You are optimizing quantile loss for 95th percentile in this situation. You can read up more on how quantile loss works here and here.

How to use a quantile regression mode at prediction time, does it give 3 predictions, what is y_lower and y_upper?

In your code, you have created one classifier. You're first fitting and predicting for alpha=0.95, then using clf.set_params() you're using the same classifier to fit and predict for alpha=0.05.

For real predictions, you'll fit 3 (or more) classifiers set at all the different quantiles required to get 3 (or more) predictions.