6

I'm working on a project wherein I compare the presence/absence of a number of bird and herptile species between wetlands that have received three different treatments. The populations were surveyed across two different years. So the response variable is a binary categorical variable, and the predictor variables (wetland treatment and year of sampling) are also categorical.

The situation gets a bit more complicated in that different sampling schedules were used between the two years, resulting in different sample sizes between the two years. I'm looking at modeling animal presence/absence using the glm() function in R, but I'm not sure if there might be a more appropriate approach?

user43769
  • 61
  • 1
  • 2

2 Answers2

3

You can use the binomial GLM, as it provides the freedom to model different sample sizes, $m_i$. So, you can use glm() function as follows:

glm(cbind(presence, absence) ~ 1 + treatment + year, family=binomial)

where "presence" and "absence" show the number of present or absent cases.

MMM
  • 758
  • 6
  • 8
1

I highly recommend you the R book, chapters 15 till 17. If you have just categorical variables and no continuous ones, Crawley' R Book suggests to make a contingency table or to convert your binary data in proportion data and analyze it then. I had the same problem (binary count data and just categorical explanatory variables) and made a binomial GLM and later a GLM with proportion data of counts. Both worked fine, outcome was the same.

fly
  • 11
  • 1
  • 2
    This is fine, but just so you know, if you analyze your presence/absence data as a chi-squared, you will be using a *score* test, whereas the tests that come by default w/ a logistic regression are *Wald* tests. For more see [here](http://stats.stackexchange.com/a/144608/7290). – gung - Reinstate Monica Apr 17 '15 at 09:04