I am investigating the impact of some independent variables on educational expenditure shares, which is given as the proportion of $\frac{educational\_expenditure_i}{total\_expenditures_i}$. The response variable (share) has $N = 2614$ observations with $890$ zero observations. The data is an unbalanced panel, consisting of two waves. Here is a histogram of the response variable:
The zero observations are most likely distinct consumer choices not to invest any money in education, hence I want them to be represented in my model.
Since the data comes in (unbalanced) panel form, I'd like to estimate the model with and without Fixed Effects. I have $T = 2$ time periods and $K = 1565$ groups.
What I did so far in $Stata$:
1) Tobit Model: Assuming that there is an underlying latent variable that drives the consumer to make a certain (zero-consumption) choice. Even though this gave me reasonable and overall very significant results, there is some critic that the Tobit model should just be used, if observations below the threshold are theoretically possible. Also estimating Fixed Effects in the Tobit framework is apparently not recommended since its theoretical properties are poor.
2) OLS Fixed Effects: As I understood standard OLS does not account fully for the characteristics of the response variable. The results are also statistically insignificant.
xtreg shares IV i.year, fe vce(cluster)
$IV$ represents the independent variables.
3) Fractional logit model: Following Patke and Wooldridge (2008). This allows to take the fractional characteristic into account as well as the zero observations. However estimating Fixed Effects in this framework seems to be cumbersome and I did not find a satisfying solution yet.
Panel regression, which led to similar results (in the margins) as the Tobit regression, but with much smaller coefficients:
glm share IV i.year, link(logit) family(binomial) vce(cluster) nolog eform
margins, dydx(*)
Fixed effects (taken from https://www.statalist.org/forums/forum/general-stata-discussion/general/1446970-fractional-logit-model-unbalanced-panel-two-way-fixed-effects):
xtgee share IV c.hhID i.year c.ID##i.year, family(bin) link(logit)
corr(independent) i(ID) t(year) vce(robust)
whereby $ID$ is the panel variable, and $year$ the time variable. I have trouble to understand the model specification and also to interpret the results correctly, specifically the interaction term and categorial variables: c.hhID i.year c.ID##i.year. I've also read that Fixed Effects in the $xtgee$ framework should be applied to a balanced panel. However I've run the Verbeek-Nijman test on my panel and confirmed that the attrition is random and thus proceeded to use the unbalanced panel.
My questions are:
1) How would you model such a response variable? I've also read about Zero-inflated beta models, or Poisson regressions. I'd like to first run a regression on the panel and then apply Fixed Effects to compare the results.
2) What is the best approach to apply Fixed Effects in the $GLM$, or in your proposed framework?
Please let me know if I should add some results, figures, information.