4

My data is percentage disease data of different varieties of plants that had been inoculated with disease from several different sources. having conducted two-way ANOVA in SPSS (using the log10+1 of my proportions (+1 due to some zero percents in the data)) I find that my data fails homogeneity of variance but (mostly) normally distributed. I have analysed residuals and found that this appears to caused by one of the inoculated varieties which has data skewed towards zero percent seemingly irrespective of disease source.

https://www.dropbox.com/home?preview=spss+output+pilot+study+aug2015.docx

Our resident statistician has looked at my data and told me that perhaps my best option is to use a beta distributed GLM, as I need to be able to reliably determine if there is an interaction between the two independent variables. However despite learning as much as I can about this over the last couple of days, I am unsure how best to implement this in R, and have no idea how to determine whether or not this is a valid fit for my data (this is where I am most stuck).

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Thomas
  • 75
  • 1
  • 2
  • 7
  • 2
    You may need zero-inflated beta to cope with the exact zeros. See [this](http://stats.stackexchange.com/questions/113797/interpretation-of-zero-one-inflated-beta-regression-with-r-gamlss), [this](http://stats.stackexchange.com/questions/64634/modelling-zero-inflated-proportion-data-in-r-using-gamlss), which discuss the R package `GAMLSS`; there's also the R package `zoib`. – Glen_b Aug 31 '15 at 09:10

2 Answers2

7

I suppose you could look at this two different ways:

  1. as true proportions
  2. as binomial counts from a total

Option 2. would be a simple binomial GLM (binomial family, logit link [for starters]), but you need to have counts out of a total count; e.g. the number showing disease out of the total.

This can be fitted using

mod <- glm(y ~ x1 + x2, data = foo, family = binomial(link = "logit"))

where y, the response can be specified in several ways. Read ?glm for the details.

Option 1., the Beta regression, is suitable for true proportions. this can be fitted using the betareg package and the function betareg()

mod <- betareg(y ~ x1 + x2, data = foo, link = "logit")

though be sure to read the two vignettes that come with the betareg package for the details.

Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153
2

A beta GLM won't be able to deal with exact 0s, so I don't think that that is what you will want to do. Instead you could look into fractional logits (Papke and Wooldridge 1996). I don't know SPSS well enough to tell you how to do it in there.

Papke, Leslie E. and Jeffrey M. Wooldridge. 1996. Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates. Journal of Applied Econometrics, 11(6):619-632.

http:\dx.doi.org\10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1

Maarten Buis
  • 19,189
  • 29
  • 59
  • 1
    thank you very much for your help, it occurs to me that i perhaps should have mentioned that my data is a little zero inflated. i recently read about the GAMLSS package in this thread http://stats.stackexchange.com/questions/64634/modelling-zero-inflated-proportion-data-in-r-using-gamlss In your opinion would this be a good approach for me to attempt? – Thomas Aug 31 '15 at 10:14