1

A friend of mine is estimating the following model, using ols:

$y=\alpha + \beta X + u$,

where $y$ and $X$ are continious variables, $\alpha$ and $\beta$ are parameters and $u$ is an error component. In a second step, he then splits the sample at the median of the residuals. He then estimates the following regression:

$y=\alpha + \beta X + \gamma Dummy(above\_Median=1) + \delta X \times Dummy(above\_Median=1)$ + v,

where $Dummy(.)$ is one if an observation is above the median and zero otherwise. My feeling is that this procedure is extremely strange and the estimates are probably biased. But I currently cannot depict/formalize the problem. What potential problems do you see? Can this be done? What do you think about this procedure?

bachelor
  • 11
  • 1

1 Answers1

1

This amounts to discretizing (specifically: dichotomizing) a continuous variable. Your friend fits a model that has a discrete jump as an observation moves from the "below median" to the "above median" group. This almost certainly doesn't make sense. Read more here: What is the benefit of breaking up a continuous predictor variable?

If there is remaining nonlinearity, which you could detect, e.g., in plotting residuals against actuals or against fitted values, then it's far better to refit a model with .

Friends don't let friends discretize continuous variables.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thanks a lot. Just as a follow up. His problem is that residuals in his model have "a deeper interpretation" and he is interested in modelling heterogeneity in this componend (i.e. heterogeneity in $\beta$). I don't see how this can be done without using sort of a proxy for the residuals. – bachelor Nov 14 '17 at 09:46
  • Hm. Is he looking for an [tag:interaction] term? Or does he want to model [tag:heteroskedasticity]? – Stephan Kolassa Nov 14 '17 at 10:04
  • He is rather interested in interaction terms. – bachelor Nov 14 '17 at 10:15
  • Then I'd recommend that he explicitly models it as such. – Stephan Kolassa Nov 14 '17 at 10:16
  • This is what I also proposed, but the strange thing is that he is interested in interactions with the residual. So it always needs to be a two step procedure; to me, an interaction with the residual looks extremely strange. – bachelor Nov 14 '17 at 10:22
  • 1
    Well, to me it does, too. An interaction with residuals sounds very much like conditional heteroskedasticity to me. If you have a statistician available, I'd recommend your friend buy him a cup of coffee and explain his setup to him. – Stephan Kolassa Nov 14 '17 at 11:06