Question about using a multiplicative dummy variable

Question

In many econometrics model, the changes in the response variables in certain intervals are more difficult than other intervals. But I believe this is often not considered when estimating the model.

For example, suppose $Y_{st}$ represents the proportion of students in a certain school $s$, passing a standardized test in year $t$. Let $R_{st}$ be the academic resources students (ex. books in library), and $I_{st}$ represent average parental income of the students. In this case $Y_{st} \in [0,1],$ and we would like to estimate effect of $R_{st}$ on $Y_{st}.$

We could model this is as follows,

$Y_{st} = \alpha_{0} +\alpha_{1}R_{st} + \alpha_{2}I_{st} + \delta_{t}+ u_{st}$, where $u_{st}$ is additive error term, and $\delta_{t}$ are time dummies. In this context of pass rates, intuitively it is more difficult for a school to increase the pass rates from 95% to 100%, then it is for them to go from 45% student passing, to 50% student passing. Consequently, the effect of $R_{st}$ on $Y_{st}$ should be given less weight on the latter situation (45% to 50%), than the former (95% to 100%). Suppose we were comparing two schools in which the same $R_{st}$ increase lead to these results, clearly the 95% to 100% school invested more efficiently.

My idea is to use a multiplicative dummy variable with $R_{st}$, $\beta_{t}$, where $\beta_{t}$ takes on different values depending on the initial value of $Y_{st}.$ Is there a standard way to take this into consideration in the model? Are there other additional factors that could improve this model?

This issue is usually solved by appyling suitable transformation to the response or, even better, to the parameter of interest (logistic transformation e.g.) — Michael M, Jul 10 '14 at 20:25
@MichaelMayer I have read papers in which transformation are applied to bounded response variables such as this (to deal with ceiling effects), and I do plan to do a logistic transformation here. But your saying the logistic transformation will also solve this issue? — user77404, Jul 10 '14 at 20:56

score 3 · Accepted Answer · edited Apr 13 '17 at 12:44

3

In your setting, logistic regression seems to be the natural way to go since your percentages are related to a count (number of successful students per school). The interpretation of effects through odds ratios solves your issue that it is more difficult to come from 90 to 95% than from 50 to 55%. Moreover, you can't get percentages below 0 or above 100 and you don't have problems with heteroscedasticity near the boundary.

You might want to have a look at What are the issues with using percentage outcome in linear regression? for models with a percentage response.

edited Apr 13 '17 at 12:44

Community

1

answered Jul 10 '14 at 21:17

Michael M

10,553
5
27
43

The odds ratio sound like a good idea. If we let $G$ be logistic function, $G(x) = \frac{1}{1+e^{-x}},$ $Y_{st} = G(\alpha_{0} +\alpha_{1}R_{st} + \alpha_{2}I_{st} + \delta_{t}+ u_{st})$ be a transformation of above model, then if we estimate $G^{-1}(Y_{st}) = \alpha_{0} +\alpha_{1}R_{st} + \alpha_{2}I_{st} + \delta_{t}+ u_{st}$ (this gets rid of ceiling effects since $G^{-1}:[0,1] \to \mathbb{R}$) using OLS, are the estimates here odds ratios? Because I would like to transform the model to eliminate the ceiling and floor, and still be able to interpret the estimates in terms of odds ratios. – user77404 Jul 11 '14 at 17:57

Question about using a multiplicative dummy variable

1 Answers1