Beta distribution GLM with categorical independents and proportional response

Question

My data is percentage disease data of different varieties of plants that had been inoculated with disease from several different sources. having conducted two-way ANOVA in SPSS (using the log10+1 of my proportions (+1 due to some zero percents in the data)) I find that my data fails homogeneity of variance but (mostly) normally distributed. I have analysed residuals and found that this appears to caused by one of the inoculated varieties which has data skewed towards zero percent seemingly irrespective of disease source.

https://www.dropbox.com/home?preview=spss+output+pilot+study+aug2015.docx

Our resident statistician has looked at my data and told me that perhaps my best option is to use a beta distributed GLM, as I need to be able to reliably determine if there is an interaction between the two independent variables. However despite learning as much as I can about this over the last couple of days, I am unsure how best to implement this in R, and have no idea how to determine whether or not this is a valid fit for my data (this is where I am most stuck).

You may need zero-inflated beta to cope with the exact zeros. See [this](http://stats.stackexchange.com/questions/113797/interpretation-of-zero-one-inflated-beta-regression-with-r-gamlss), [this](http://stats.stackexchange.com/questions/64634/modelling-zero-inflated-proportion-data-in-r-using-gamlss), which discuss the R package `GAMLSS`; there's also the R package `zoib`. — Glen_b, Aug 31 '15 at 09:10

score 7 · Answer 1 · answered Aug 30 '15 at 18:50

I suppose you could look at this two different ways:

as true proportions
as binomial counts from a total

Option 2. would be a simple binomial GLM (binomial family, logit link [for starters]), but you need to have counts out of a total count; e.g. the number showing disease out of the total.

This can be fitted using

mod <- glm(y ~ x1 + x2, data = foo, family = binomial(link = "logit"))

where y, the response can be specified in several ways. Read ?glm for the details.

Option 1., the Beta regression, is suitable for true proportions. this can be fitted using the betareg package and the function betareg()

mod <- betareg(y ~ x1 + x2, data = foo, link = "logit")

though be sure to read the two vignettes that come with the betareg package for the details.

score 2 · Answer 2 · answered Aug 31 '15 at 08:08

2

A beta GLM won't be able to deal with exact 0s, so I don't think that that is what you will want to do. Instead you could look into fractional logits (Papke and Wooldridge 1996). I don't know SPSS well enough to tell you how to do it in there.

Papke, Leslie E. and Jeffrey M. Wooldridge. 1996. Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates. Journal of Applied Econometrics, 11(6):619-632.

http:\dx.doi.org\10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1

answered Aug 31 '15 at 08:08

Maarten Buis

19,189
29
59

1

thank you very much for your help, it occurs to me that i perhaps should have mentioned that my data is a little zero inflated. i recently read about the GAMLSS package in this thread http://stats.stackexchange.com/questions/64634/modelling-zero-inflated-proportion-data-in-r-using-gamlss In your opinion would this be a good approach for me to attempt? – Thomas Aug 31 '15 at 10:14

Beta distribution GLM with categorical independents and proportional response

2 Answers2