Say I have a stimuli (Sound from a speaker) which can either be present (Y) or not (N). I have observed arrivals of individuals and noted whether during these arrivals the stimuli was Y or N. I now wish to see which factors affect whether individuals choose to arriva. My variables are:
Pb_on
= Is there sound Y/N (Factor, 2 levels)Pb_type
= What sound is being played (Factor) NOTE: Initially this factor had 4 levels. One beingSilence
, thisSilence
is excluded as the playback for Silence was alwaysoff
. Which causes problems with model estimations.NR_Tot
= How many individuals are already present (Numerical)Site
= The experiment was conducted at several sites (Factor, 12-13 levels).
I now wish to determine whether more individuals arrived during sound playback or not.
I do not wish to use a glmer in the form of: glmer(cbind(Y,N) ~ factors)
as this would mean that I have to sum my observations per site which causes me to lose much data on individual observations (such as NR_Tot
).
It was suggested to me to use something resembling the following formula:
glmer(pb_on ~ Pb_type + Nr_Tot + (1|Site), data = df, family = binomial(link = "logit"))
The idea being that due to specifying the binomial family, the model will 'understand' that I wish to determine which factors affect the ratio between Pb_on
being Y or N.
Now I can get this model running. But I have two questions:
The model ends up being singular. When analysing the summary one can see that the variance/std. dev of the random effect is 0, which I assume is the issue. Comparing the model with a glm model (without random effect but otherwise identical) using the
anova()
function seems to suggest that a random effect is not necessary (Though I know that from a philosophical point of view you would prefer to include it, as different sites were used to conduct the experiment). Is this approach acceptable? (It was based on information found here). Or should I instead perhaps treatSite
as a fixed effect (though then I am somewhat lost as to how to interpret my results).Is this actually how the model works? Or is it now analysing something else? The model (without random effect) works fine, though none of the variables turn out to be significant. And ultimately, the 'best' model would be a null model, suggesting none of my factors are of great import.