6

I've tried to find an answer to this to no avail. It is similar to this question, but not quite identical.

I have a continuous dependent variable that occurs between zero and one (with some data points landing precisely on zero and one). I want to fit a $S$-shaped regression. My data is on coral reefs and the $Y$ variable is percentage dead per m$^2$. I don't have frequency counts in terms of successes and failures, just percentage mortality by cover.

Is logit a terrible idea here? Since I only have two sig-figs would turning 80% mortality into 80 out of 100 observations be a bad idea? Is there some other sigmoidal function I might explore?

coralGuy
  • 61
  • 1
  • 2
  • 1
    I suggest you to look into beta regression. It is designed exactly for percentage as dependent variable. – O_Devinyak Aug 29 '13 at 05:51
  • 3
    Pedantic terminology comment. Dead per square meter scaled to [0,1] is a proportion or fraction, not a percentage. Regardless, software in this area will expect [0,1] input, not [0,100] input. You seem clear on this, but the comment is inserted because I have often encountered minor confusion on the point. – Nick Cox Aug 29 '13 at 07:46

1 Answers1

8

I don't think beta regression, as suggested by @O_Devinyak, will work well for this case as there are exact 0s and 1s in the data and the beta distribution only works for values between, but not including, 0 and 1.

A solution that has become more popular in economics is the so-called fractional logit model, which economists tend to attribute to Papke and Wooldridge (1996), though the basic idea can be traced back to at least Wedderburn (1974). Nowadays it is fairly easy to estimate such models. For example in Stata (the statistical program I know best) you would use the glm program in combination with the link(logit) family(binomial) vce(robust) options.

Wedderburn, R. W. 1974. Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3): 439-447.

Papke, Leslie E. and Jeffrey M. Wooldridge. 1996. Econometric methods for fractional response variables with an application to 401(k) Plan participation rates. Journal of Applied Econometrics, 11(6): 619-632.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Maarten Buis
  • 19,189
  • 29
  • 59
  • 1
    +1. Stata users can also find a concise review in Baum, C.F. 2008. Modeling proportions. _Stata Journal_ 8: 299-303 which is accessible at http://www.stata-journal.com/sjpdf.html?articlenum=st0147 Even non-Stata users may find this helpful. – Nick Cox Aug 29 '13 at 07:44
  • +1. Is there some special reason that one *needs* to use robust errors in this case? – amoeba Sep 04 '16 at 21:32
  • @amoeba : without robust standard errors (i.e. maximum likelihood) we assume that the dependent variable is binomially distributed, i.e. has only two values. That is obviously not correct for this application. With robust standard errors (i.e. maximum *quasi*-likelihood) we say "ignore the distribution of the dependent variable, we are only interested in the conditional mean", now the model can be used in this case. – Maarten Buis Sep 05 '16 at 09:53
  • Thanks, @Maarten. Clarification: do I understand correctly that robust standard errors only change the standard errors of the estimated coefficients (as per the accepted answer here: http://stats.stackexchange.com/questions/89999), but the estimates themselves are computed from the binomial GLM as is? – amoeba Sep 05 '16 at 19:19
  • Correctish. The best way to learn this is to start reading the references. Learning an entire technique through asking lots of small questions as comments is not the intended way to use this site, nor will it likely lead to the kind of insight you are looking for. – Maarten Buis Sep 05 '16 at 22:31