Linear Regression with Species and Land Use Data

Question

I am really sorry about asking this question because I don't understand the first thing of how to analyze it.

Ok, so there are these sites, differentiated by SiteID, and Locality is the crop being grown there (Bluhflache and Rapsfeld). I want to examine species richness (number of different species) and Species Abundance (total number individuals in each species) and see if they are affected by LandUse. There are more species and land use columns than shown in the pictures.

The land use being either the Bluhflache or Rapsfeld crop treatment, as well as the other land use data set of a 1500 m buffer around the sites:

WHAT MY MAIN QUESTION IS, is how do I do a linear regression analysis of the species data and 1500 m buffer data shown above? I don't even know where to start. I know it is land use around each site (the independent variable), and I want to see how that influences species abundance and data (the dependent variables) on those sites, but I don't know how to compare and analyze them whatsoever. I am looking into resources online for now. I am using R.

Here's a map i made of the sites if that helps at all:

Here is the GLM I've done so far just for 2017 data, which was pooled into just the site number not field type:

SpeciesAbund2017 <- apply(Pooled2017SpeciesData[,-1],1,sum)

1 204 102 176 305 241 512 106 302

Pooled2016LandUsefor2017 <- read.csv("2016LandUsePooledfor2017.csv",header=T)

SpeciesAbund2017model <- glm(SpeciesAbund2017 ~ Pooled2016LandUsefor2017$perc_forest_500 + Pooled2016LandUsefor2017$perc_arable_500 + Pooled2016LandUsefor2017$perc_seminat_500 + Pooled2016LandUsefor2017$perc_rape_500 + Pooled2016LandUsefor2017$perc_grassland_500, family = poisson)

summary(SpeciesAbund2017model)

Result:

Call:
glm(formula = SpeciesAbund2017 ~ Pooled2016LandUsefor2017$perc_forest_500 + 
    Pooled2016LandUsefor2017$perc_arable_500 + Pooled2016LandUsefor2017$perc_seminat_500 + 
    Pooled2016LandUsefor2017$perc_rape_500 + Pooled2016LandUsefor2017$perc_grassland_500, 
    family = poisson)

Deviance Residuals: 
      1        2        3        4        5        6        7        8  
-7.4954  -4.9937  -7.2495   3.9528   2.2330   5.1786  -0.5689   6.5993  

Coefficients:
                                            Estimate Std. Error z value
(Intercept)                                   7.8602     0.5147  15.272
Pooled2016LandUsefor2017$perc_forest_500     -3.2097     0.4934  -6.505
Pooled2016LandUsefor2017$perc_arable_500     -1.5481     0.5960  -2.598
Pooled2016LandUsefor2017$perc_seminat_500    -1.2065     1.0619  -1.136
Pooled2016LandUsefor2017$perc_rape_500       -4.7038     0.5379  -8.744
Pooled2016LandUsefor2017$perc_grassland_500  -3.6157     0.5231  -6.912
                                            Pr(>|z|)    
(Intercept)                                  < 2e-16 ***
Pooled2016LandUsefor2017$perc_forest_500    7.79e-11 ***
Pooled2016LandUsefor2017$perc_arable_500     0.00939 ** 
Pooled2016LandUsefor2017$perc_seminat_500    0.25585    
Pooled2016LandUsefor2017$perc_rape_500       < 2e-16 ***
Pooled2016LandUsefor2017$perc_grassland_500 4.78e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 483.18  on 7  degrees of freedom
Residual deviance: 224.98  on 2  degrees of freedom
AIC: 294.62

Number of Fisher Scoring iterations: 4

Does this make sense?

Also, to look at potential effects of area of forest edge (area_forestedge_###) on species data do I do a correlation test?

excellent progress! :-) You have some nice results already! :) Just a note, I recommend you to widen your R console, so that the model table is not wrapped to multiple lines :) — Tomas, Dec 14 '19 at 18:47

score 0 · Answer 1 · answered Dec 13 '19 at 15:11

0

He||o Rachel!

You want to use the poisson GLM. In R:

model <- glm(abundance ~ perc_forest + perc_arable, family = poisson)
summary(model)

Using quasipoisson instead of poisson will handle overdispersion as well.

Soon, you will solve the issue of how to select the optimal model selecting the optimal variables. You can do it using e.g. the stepAIC function:

require(MASS)
MASS::stepAIC(model, formula(model), direction = 'backward')

You can also look at the package dredge.

answered Dec 13 '19 at 15:11

Tomas

5,735
11
52
93

As there seem to be data on multiple species abundance to evaluate, I wonder why you aren't recommending something like canonical correspondence analysis as discussed in your own [recent question](https://stats.stackexchange.com/q/438394/28500). – EdM Dec 13 '19 at 16:32
@EdM LOL :-D ..... oh yes... no I wouldn't recommend that... not for Rachel's question... the results are not easy to interpret... almost everyone in ecology field uses these fancy plots with arrows and hardly anyone is able to understand what it's saying :-D – Tomas Dec 13 '19 at 21:07
@EdM and BTW, speaking of that question... I'd awarded you the full bounty, because there was not enough time and it was about to expire... but the question doesn't feel answered yet.. ;-) – Tomas Dec 13 '19 at 22:49
@Curious. Hi! Thanks so much for the response. Just a follow up question, do I compare for all land uses in all radii of the buffer, or maybe one buffer at a time? I included an example with just one radii at the end of my question above. – Rachel Brockamp Dec 14 '19 at 16:49
@RachelBrockamp good question :-) I've solved it myself. You may compare different models (using e.g. AIC, or cross-validation) with different variants of radii for each variable (there will be lots of models to try :-D). Or, more feasible solution might be (I'd prefer this) to include the smallest radii (say 500m) and then the ring say 500-1000m and then 1000m to 2000m (you can try different variants) and then try and compare everything in one model :) – Tomas Dec 14 '19 at 18:41
@RachelBrockamp btw, seems that we are colleagues! :-) I was doing almost exactly the same models for birds. I am based in Prague, Charles University, also Birdlife. Where are you from, according to species I guess it's Europe right? :-) – Tomas Dec 14 '19 at 18:44
@Curious I'm actually an American studying in Canada. My area is soil science, so not very related to the data I presented. I studied in Germany briefly but did not take the course that this data is from. But I have the responsibility of analyzing it for an assignment. With my weak statistical background this has become my nightmare... – Rachel Brockamp Dec 19 '19 at 22:10

Linear Regression with Species and Land Use Data

1 Answers1