4

Very concise question: if I model a phenomenon which takes only positive values (for example, revenues or production) using the classical OLS, what are the consequences in terms of bias, efficiency and consistency of the estimator? In other words, given that the linear model assumes that my dependent variable is defined on the whole real line, what happen if I use it anyway to model a positively defined variable? I know that fitting alternative distribution such as the Gamma could be a good solution (but this regards the distribution of residuals), I'm only interested in knowing if a similar model is still useful or methodologically acceptable.

Edit: The idea is: I'm modeling a phenomenon via a line. A line for definition can take also negative values, my variable does not, so what consequences does using a line for a variable that a priori is non-negative imply?

3 Answers3

1

There is no such assumption in classical ordinary least squares regression to have your dependent variable take on values on the real line. I think you are confusing this with the assumption that the errors $e$ are distributed Normally which implies the errors fall on the real line (i.e. $-\infty<e<\infty$).

StatsStudent
  • 10,205
  • 4
  • 37
  • 68
  • There is more in my question than the assumption of Gaussianity of errors. The idea is: I'm modeling a phenomenon via a line. A line for definition can take also negative values, my variable does not, so what consequences does using a line for a variable that a priori is non-negative imply? – Kolmogorovwannabe Dec 20 '18 at 22:35
  • 1
    I'd suggest revising/editing the question to clarify what you're asking. In your original question you have stated "given that the linear model assumes that my dependent variable is defined on the whole real line," but there is no such assumption. I think an example as it relates to your dependent and independent variables would go a long way here. – StatsStudent Dec 20 '18 at 22:40
  • I've clarified my question, anyway, I'm here to understand what are the consequences of using a line to model a non-negative variable, for example production. The discussion over residuals is also interesting from the point of view of the consequences so I will find interesting also related answers. – Kolmogorovwannabe Dec 21 '18 at 06:58
0

The interpretation of the coefficients may need to be corrected near the boundary (in your case, near 0).

Let's assume your model is something like $y = \beta x$

So if the $x$ is sufficiently far from 0, then a linear function is a reasonable approximation, and everything is as normal. When we are near 0, a $-\Delta x$ unit increase in a predictor does not yield a $-\beta \Delta x$ change in the outcome, for if it did then we run the risk of crossing that boundary.

Questions of bias, consistency, and efficiency will likely depend how the true function behaves away from 0.

Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94
0

If your dependent variable is continuous but limited (say to [0,1]) then a linear model is not appropriate anymore (unless you have few cases near the edges), in this case you would use a different kind of regression (say beta regression for the case above).

If your dependent variable is continuous and unlimited but in your sample you have only some of the possible values that your dependent variable can take, then your inference will be doubtful.

If your dependent variable is continuous and limited but the limits will never be reached (say for ex. height, since you are unlikely to have people with height close to 0 in your sample) then everything will work fine.

user2974951
  • 5,700
  • 2
  • 14
  • 27