13

I am interested in using quantile regression for some of my models, but would like to have some clarifications on what can I achieve using this methodology. I understand I can obtain a more robust analysis of IV/DV relationship, especially when faced with outliers and heteroscedasticity, but in my case the focus is on prediction.

In particular I'm interested in improving the fit of my models, without resorting to more complex non-linear models, or even piecewise linear regression. At prediction, is it possible to select the highest probability outcome quantile based on the value of the predictors? In other words, is it possible to determine each predicted outcome quantile probability, based on the value of the predictors?

Robert Kubrick
  • 4,078
  • 8
  • 38
  • 55

2 Answers2

9

The right hand side of a model in quantile regression has the same structure and types of assumptions as other regression models such as OLS. The main differences with quantile regression are that one directly predicts quantiles of the distribution of $Y$ conditional on $X$ without resorting to parametric distributional manipulations (e.g., $\bar{x} \pm 1.96s$), and that no distributional shape of residuals is assumed other than assuming that $Y$ is a continuous variable.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • 1
    I think I understand how the fitting process works. What I don't understand is if there is a way to improve the prediction (quantile parameter selection) *without* knowing in which quantile the observation will be. Can we somehow derive this from the predictor values? Maybe there is something that can be used based on the probability distribution of predictors vs. observations. – Robert Kubrick Feb 11 '12 at 15:36
  • 2
    I think you need to do a significant amount of background reading on quantile regression. Observations do not lie "in quantiles". A quantile is a property of a continuous distribution. The 0.5 quantile is the median; the 0.75 quantile is the upper quartile. The 0.75 quantile of $Y | X=x$ is the 75th percentile of $Y$ when $X=x$. – Frank Harrell Feb 11 '12 at 17:46
  • 2
    Frank, I'm sure I need to learn more about quantile regression. Before I dive in, I'd like to understand if this methodology can offer some probabilistic component for the choice of the quantile, based on the predictors and the fitted model. For each given set/range of predictor values there must be a likelihood that the actual outcome will fall in a certain quantile region. – Robert Kubrick Feb 11 '12 at 20:15
4

Quantile regression is about predicting quantiles of the dependent variable. In "regular" regression, we predict the mean of the DV. But interest could be in other parts of the DV. E.g. You might be interested in predicting which newborn babies will be very light, which songs will be exceptionally popular, or which customers will buy a ton of stuff.

I wrote a paper about it for NESUG last year.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • I went through the paper, interesting read, but I have to reiterate my original question. After I have fit the model, is there a way to tell which quantile outcome I should use for prediction, based on the value of my predictors? Otherwise I will have x different predictions (1 per quantile) but I would not know which one to use. – Robert Kubrick Feb 11 '12 at 15:31
  • 2
    You choose which quantile to predict based on what you want to know. No program can tell you which question to ask! – Peter Flom Feb 11 '12 at 16:50
  • 1
    Given the fitted model, can't you calculate the probability of a predicted value to fall in the 0.6 quantile, based on the predictor values? – Robert Kubrick Feb 11 '12 at 20:09
  • 2
    Not "in the .6 quantile" but at or above the 0.6 quantile, but yes. But you have to decide which quantile you want to predict. In OLS regression, you predict the conditional mean; in quantile regression you predict the conditional quantiles – Peter Flom Feb 11 '12 at 21:45
  • 5
    As Peter indicated, you are still not understanding earlier comments. Quantile regression has nothing to do with computing probabilities of falling above or below a certain quantile (note that the probability of falling "in" the 0.6 quantile is zero by definition). You find out if you are interested in predicting the median or other quantiles then do that. A conditional quantile is a single number not a range. – Frank Harrell Feb 12 '12 at 01:59
  • 1
    If i understand, u choose which quantile to use for ur predictions but is there not a way to choose which quantile is e best for e prediction –  Oct 07 '14 at 02:28