2

I wish to analyse my dataset, and concluded with the information I have, biodiversity would be good choice. I have 3 sites each are a reef and have the same zonation (split into 3 zones). I am aiming to compare the biodiversity of each zone, at each site. And then produce figures showing this

The 3 zones are, reef flat, reef crest, and slope. I am trying to calculate the biodiversity for each zone, at each site . My aim is to primarily compare the differences in biodiversity between the zones, however I can also compare differences between each sites biodiversity. The 4 different years that data was collected at each site would be replications to produce a more accurate biodiversity reading, I think that the Shannon's diversity index is the correct test to use but I am unsure. this is an example dataset as mine is too large. Any help would be greatly appreciated this has been confusing me a lot

> dput(egdata)
structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2019L, 2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L), Site = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Zone = c("c", "c", "f", 
"f", "f", "c", "s", "s", "c", "s", "s", "s", "s", "s", "f", "f", 
"f", "f", "s", "s", "s", "s", "c", "c", "c", "c", "f", "f", "f", 
"f", "f", "c", "c", "f", "f", "f", "c", "s", "s", "c", "s", "s", 
"s", "s", "s", "f", "f", "f", "f", "s", "s", "s", "s", "c", "c", 
"c", "c", "f", "f", "f", "f", "f", "c", "c", "f", "f", "f", "c", 
"s", "s", "c", "s", "s", "s", "s", "s", "f", "f", "f", "f", "s", 
"s", "s", "s", "c", "c", "c", "c", "f", "f", "f", "f", "f"), 
    Species = c("Amblyglyphidodon curacao", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon curacao", "Amblyglyphidodon leucogaster", 
    "Chaetodon semeion", "Chromis analis", "Chromis analis", 
    "Chromis xanthura", "Ctenochaetus striatus", "Ctenochaetus striatus", 
    "Amblyglyphidodon curacao", "Amblyglyphidodon leucogaster", 
    "Chaetodon semeion", "Chromis analis", "Chromis analis", 
    "Chromis xanthura", "Ctenochaetus striatus", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon leucogaster", "Chaetodon semeion", "Chromis analis", 
    "Chromis analis", "Chromis xanthura", "Ctenochaetus striatus", 
    "Chromis xanthura", "Ctenochaetus striatus", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon leucogaster", "Chaetodon semeion", "Chromis analis", 
    "Chromis analis", "Amblyglyphidodon curacao", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon curacao", "Amblyglyphidodon leucogaster", 
    "Chaetodon semeion", "Chromis analis", "Chromis analis", 
    "Chromis xanthura", "Ctenochaetus striatus", "Ctenochaetus striatus", 
    "Amblyglyphidodon curacao", "Amblyglyphidodon leucogaster", 
    "Chaetodon semeion", "Chromis analis", "Chromis analis", 
    "Chromis xanthura", "Ctenochaetus striatus", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon leucogaster", "Chaetodon semeion", "Chromis analis", 
    "Chromis analis", "Chromis xanthura", "Ctenochaetus striatus", 
    "Chromis xanthura", "Ctenochaetus striatus", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon leucogaster", "Chaetodon semeion", "Chromis analis", 
    "Chromis analis", "Amblyglyphidodon curacao", "Amblyglyphidodon curacao", 
    "Chaetodon semeion", "Chaetodon semeion", "Chaetodon semeion", 
    "Chromis analis", "Chromis analis", "Chromis xanthura", "Ctenochaetus striatus", 
    "Amblyglyphidodon curacao", "Amblyglyphidodon curacao", "Amblyglyphidodon leucogaster", 
    "Ctenochaetus striatus", "Chromis analis", "Chromis analis", 
    "Chromis xanthura", "Ctenochaetus striatus", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon leucogaster", "Chaetodon semeion", "Chromis analis", 
    "Chromis analis", "Chromis xanthura", "Ctenochaetus striatus", 
    "Ctenochaetus striatus", "Ctenochaetus striatus", "Amblyglyphidodon curacao", 
    "Amblyglyphidodon leucogaster", "Chaetodon semeion", "Chromis analis", 
    "Chromis analis")), class = "data.frame", row.names = c(NA,-93L))

this is a table showing a small section of the dataset as it is in excel

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
jack23
  • 85
  • 5
  • 1
    You seem to have asked the same question earlier: https://stats.stackexchange.com/questions/525475/how-can-i-calculate-biodiversity-in-r At least part of the problem is that you don't define what "biodiversity" is. I made a comment to your previous question, noting a package that might help you. It's not my area of expertise so I don't have much more to add, but if you want to get good answers (or any answer at all) you need to give more detail. – Robert Long May 22 '21 at 22:47
  • the question got closed, and i did say shannons diversity index, it would probably be using vegan package, its just im new to r and ive just been getting nowhere with it – jack23 May 22 '21 at 22:57
  • Shannon's diversity index is not a test but a diversity index as its names indicates. It is one of the most widely community index used in ecology but other indices can also be complementary and usefull, e.g. species richness (number of species), species evenness or other diversity indices like Simpson's index. – Circus pygargus May 28 '21 at 13:05

1 Answers1

3

This is not my area of expertise, but I'll give it a go.

According to the documentation in the vegan package for the diversity function, the Shannon or Shannon--Weaver (or Shannon--Wiener) index is defined as $H = -\sum p_i log(b) p_i$, where $p_i$ is the proportional abundance of species $i$ and $b$ is the base of the logarithm. It is most popular to use natural logarithms, but some argue for base $b = 2$ (which makes sense, but no real difference).

The package provides an example dataset, and shows how to obtain the Shannon index on that dataset. So let's take a quick look:

## data(BCI)
## dim(BCI)
[1]  50 225

So the example dataset has 50 rows and 225 columns, let's take a look at the first 15 rows of the first 5 columns:

## head(BCI[, 1:5], 15)
   Abarema.macradenia Vachellia.melanoceras Acalypha.diversifolia Acalypha.macrostachya Adelia.triloba
1                   0                     0                     0                     0              0
2                   0                     0                     0                     0              0
3                   0                     0                     0                     0              0
4                   0                     0                     0                     0              3
5                   0                     0                     0                     0              1
6                   0                     0                     0                     0              0
7                   0                     0                     0                     0              0
8                   0                     0                     0                     0              0
9                   0                     0                     0                     0              5
10                  1                     0                     0                     0              0
11                  0                     0                     0                     0              0
12                  0                     0                     0                     0              1
13                  0                     0                     0                     0              1
14                  0                     0                     0                     0              0
15                  0                     0                     0                     0              2

So it looks like the columns are species, and the rows are counts for each species. The shannon index is obtained by:

## shan <- diversity(BCI, "shannon")
## str(shan)
Named num [1:50] 4.02 3.85 3.81 3.98 3.97 ...
- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...

So it returns a vector of numbers, one for each row.

Now let's apply this to your data. Let's take a look (I called the dataframe dt):

## head(dt)
  Year Site Zone                      Species
1 2013    1    c     Amblyglyphidodon curacao
2 2013    1    c     Amblyglyphidodon curacao
3 2013    1    f     Amblyglyphidodon curacao
4 2013    1    f Amblyglyphidodon leucogaster
5 2013    1    f            Chaetodon semeion
6 2013    1    c               Chromis analis

The first issue is that your data is in long format, but the diversity function wants it in wide format with a column for each species. Also you don't have counts, you just have the species name if it was present or not in each year/site/zone. So we need to reshape the data. We can do this with the tidyr package (this is a little bit tricky and I had to use stackoverflow to find out how to do it, so if you are new to R, don't worry about it!).

library(tidyr)
dt %>% count(across(everything())) %>%  
  pivot_wider(names_from = "Species",  values_from = "n", values_fill = 0) -> dt.wide

So here is the first columns of the reshaped data (dt.wide):

## head(dt.wide[, 1:6])
# A tibble: 6 x 6
   Year  Site Zone  `Amblyglyphidodon curacao` `Chromis analis` `Ctenochaetus striatus`
  <int> <int> <chr>                      <int>            <int>                   <int>
1  2013     1 c                              4                2                       2
2  2013     1 f                              1                0                       0
3  2013     1 s                              1                2                       1
4  2013     2 c                              2                1                       1
5  2013     2 f                              1                0                       0
6  2013     2 s                              0                1                       1

Now we can call the diversity function. Note that it requires just a matrix/dataset of numbers so we need to exclude the first 3 columns:

## dt.wide2 <- dt.wide[, 4:(length(dt.wide))]  ## just columns 4 and onwards
## shan <- diversity(dt.wide2 , "shannon")
## shan
 [1] 1.0397208 0.8675632 1.3296613 1.0397208 1.0986123 1.0986123 1.3862944 1.5595812 1.3862944 1.3862944 0.6931472 1.0397208 0.6931472
[14] 1.0397208 0.5623351 1.3321790 0.6931472 1.3321790

So there you have the diversity indices for each year/site/zone

Hope it helps !

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • Hi thanks so much for your comment, this looks really helpful however im am receiving an error message from this line. ```dt %>% count(across(everything())) %>% pivot_wider(names_from = "Species", values_from = "n", values_fill = 0) -> dt.wide``` the error message reads ``` Error in pivot_wider(., names_from = "Species", values_from = "n", values_fill = 0) : could not find function "pivot_wider" (I have installed and loaded dplyr and vegan) – jack23 May 23 '21 at 11:07
  • 1
    Oh, it's in the `tidyr` package. So you need to run `library(tidyr)` and if it's not already installed then install it with `install.packages("tidyr")`, or actually just install the whole tidyverse `install.packages("tidyverse")` and then `library(tidyverse)`, since the pipe function `%>%` is also needed and that's also part of tidyverse. – Robert Long May 23 '21 at 11:09
  • Yep managed to get it working, thanks a lot. Just to clarify if i wanted to look at the differences between zones as a whole rather, ie not look at yearly change, would I average the result for each zone at each site? i'm not sure if this is clear but i mean do i just calculate the mean of site 1 zone c using the 4 yearly results for this – jack23 May 23 '21 at 11:37
  • Instinctively I would suggest reshaping the data differently, so that each row is for just the site/zones, and the yearly counts are summed. It might be that just averaging it the way you suggest gives the same result but I would doubt it. So it's better to reshape the data a bit differently before running the diversity function. Take a look at the question I posted on SO: https://stackoverflow.com/questions/67658405/ there is actually a nicer way to do the reshape in the accepted answer. If you can't figure it out, try asking a question there; perhaps just copy mine and change what you want. – Robert Long May 23 '21 at 11:49
  • would reshaping it so that the yearly counts were summed, would that not increase the biodiversity as its not the average abundance of species found there? Like i mean as I'm using the different years as replicants, does summing them to then produce a biodiversity reading for a certain zone factor in the fact that these weren't all one sample (if that makes sense)? – jack23 May 23 '21 at 11:55
  • 1
    I'm not sure, to be honest, ecology isn't my area at all. If it makes more sense to you to average the diversities then I guess you can just do that :) – Robert Long May 23 '21 at 12:01
  • Hi thanks @Robert Long I was unsure too but I went ahead with trying the format I believe you eluded to. ``` eg – jack23 May 23 '21 at 13:24
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/124596/discussion-between-robert-long-and-jack23). – Robert Long May 23 '21 at 13:36