I collected some data on a species of goose called Brent Goose over the winter. A csv file of the data can be downloaded from Dropbox or imported straight into R with this code:
library(repmis)
goose_behaviour <- repmis::source_DropboxData("goose_behaviour.csv", "hy6labsyh56050g", sep = ",", header = TRUE)
Each row of the data represents a flock of geese. Each flock of geese was one of two subspecies: Dark-bellied Brent Goose or Light-bellied Brent Goose. The Dark-bellied Brent Goose were present on the east side of an intertidal mudflat and the Light-bellied Brent Goose on the west side. The values in each row represent the proportions of the flock exhibiting each behaviour, so each row sums to 1.
I want to know if Dark-bellied Brent Goose and Light-bellied Brent Goose are exhibiting different proportions of each of the 7 behavioural types.
Plots of the seven behavioural types show they are each very non-normally distributed. Nonetheless, I calculated means and standard errors for each behaviour and for each subspecies as follows:
library(dplyr)
goose_behaviour %.%
group_by(subspecies) %.%
summarise(pecking = mean(pecking), alert = mean(alert), aggression = mean(aggression), asleep = mean(asleep), preening = mean(preening), flying = mean(flying), other = mean(other)) %.%
as.data.frame()
subspecies pecking alert aggression asleep preening flying other
1 Dark-bellied Brent Goose 0.4048882 0.3438450 0.02123310 0.05914777 0.10377128 0.06418008 0.002934522
2 Light-bellied Brent Goose 0.3620766 0.3467897 0.00534835 0.17768323 0.04889585 0.05657772 0.002628567
statStandardError <- function(x) sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))
goose_behaviour %.%
group_by(subspecies) %.%
summarise(pecking = statStandardError(pecking), alert = statStandardError(alert), aggression = statStandardError(aggression), asleep = statStandardError(asleep), preening = statStandardError(preening), flying = statStandardError(flying), other = statStandardError(other)) %.%
as.data.frame()
subspecies pecking alert aggression asleep preening flying other
1 Dark-bellied Brent Goose 0.03422627 0.02902893 0.003839248 0.0163771 0.01617119 0.02160593 0.0012157325
2 Light-bellied Brent Goose 0.03014162 0.02489201 0.001104804 0.0242016 0.01068396 0.01498604 0.0007520355
Because the data are non-normally distributed, I've also used Wilcoxon rank sum test to test if the two species differ in behavioural types:
library(plyr)
llply(goose_behaviour[,1:7], function(x) wilcox.test(x ~ subspecies, goose_behaviour))
Here are my questions:
- If data are non-normally distributed, is it appropriate to calculate means and standard errors? Would calculating medians be more appropriate?
- Is a Wilcoxon rank sum test appropriate to test if the two subspecies differ in behavioural types?