1

In many econometrics model, the changes in the response variables in certain intervals are more difficult than other intervals. But I believe this is often not considered when estimating the model.

For example, suppose $Y_{st}$ represents the proportion of students in a certain school $s$, passing a standardized test in year $t$. Let $R_{st}$ be the academic resources students (ex. books in library), and $I_{st}$ represent average parental income of the students. In this case $Y_{st} \in [0,1],$ and we would like to estimate effect of $R_{st}$ on $Y_{st}.$

We could model this is as follows,

$Y_{st} = \alpha_{0} +\alpha_{1}R_{st} + \alpha_{2}I_{st} + \delta_{t}+ u_{st}$, where $u_{st}$ is additive error term, and $\delta_{t}$ are time dummies. In this context of pass rates, intuitively it is more difficult for a school to increase the pass rates from 95% to 100%, then it is for them to go from 45% student passing, to 50% student passing. Consequently, the effect of $R_{st}$ on $Y_{st}$ should be given less weight on the latter situation (45% to 50%), than the former (95% to 100%). Suppose we were comparing two schools in which the same $R_{st}$ increase lead to these results, clearly the 95% to 100% school invested more efficiently.

My idea is to use a multiplicative dummy variable with $R_{st}$, $\beta_{t}$, where $\beta_{t}$ takes on different values depending on the initial value of $Y_{st}.$ Is there a standard way to take this into consideration in the model? Are there other additional factors that could improve this model?

user77404
  • 269
  • 2
  • 9
  • 3
    This issue is usually solved by appyling suitable transformation to the response or, even better, to the parameter of interest (logistic transformation e.g.) – Michael M Jul 10 '14 at 20:25
  • @MichaelMayer I have read papers in which transformation are applied to bounded response variables such as this (to deal with ceiling effects), and I do plan to do a logistic transformation here. But your saying the logistic transformation will also solve this issue? – user77404 Jul 10 '14 at 20:56

1 Answers1

3

In your setting, logistic regression seems to be the natural way to go since your percentages are related to a count (number of successful students per school). The interpretation of effects through odds ratios solves your issue that it is more difficult to come from 90 to 95% than from 50 to 55%. Moreover, you can't get percentages below 0 or above 100 and you don't have problems with heteroscedasticity near the boundary.

You might want to have a look at What are the issues with using percentage outcome in linear regression? for models with a percentage response.

Michael M
  • 10,553
  • 5
  • 27
  • 43
  • The odds ratio sound like a good idea. If we let $G$ be logistic function, $G(x) = \frac{1}{1+e^{-x}},$ $Y_{st} = G(\alpha_{0} +\alpha_{1}R_{st} + \alpha_{2}I_{st} + \delta_{t}+ u_{st})$ be a transformation of above model, then if we estimate $G^{-1}(Y_{st}) = \alpha_{0} +\alpha_{1}R_{st} + \alpha_{2}I_{st} + \delta_{t}+ u_{st}$ (this gets rid of ceiling effects since $G^{-1}:[0,1] \to \mathbb{R}$) using OLS, are the estimates here odds ratios? Because I would like to transform the model to eliminate the ceiling and floor, and still be able to interpret the estimates in terms of odds ratios. – user77404 Jul 11 '14 at 17:57