Briefly, as noted in comments: You can specify a random effect for logistic regression in the glmer()
function in the same way as you did for linear regression in lmer()
. Residuals in a logistic regression would not be expected to be normally distributed. (For reference, in linear regression it's good to visualize the residuals as a function of predicted values rather than rely on Shapiro-Wilk.) An introduction to validation of logistic regression models is on this page.
With 4 binary predictors in the logistic regression model (ignoring the random effect for now) you have only 16 possible combinations of their values, so showing the numbers or fractions of outcomes in some type of tabular display of the predictor values could be a useful representation.
Additionally, I see a few issues here based on what's already been presented in the question; more might become obvious if a link to the data becomes available.
First, you might be in danger of over-fitting your logistic regression model with only 100-200 in each of the "silk" event categories. The usual rule of thumb in logistic regression is to evaluate no more than 1 predictor per 15 or so cases in the minority class, unless you are using some method like ridge regression that penalizes regression coefficients. In this context, what counts as a predictor is a binary or continuous variable, each level of a categorical variable beyond the first, and all interactions specified among them in the model.
If the minority class has only 100 cases, you are limited to about 6 or 7 predictors. Your model, however, includes not only the 4 binary predictors but also all 11 possible interactions among them, and incorporation of provenance as a random effect represents at least 1 additional predictor. So unless you can collect more data you need to cut back on the interactions evaluated. As noted in a comment, replacing the "*
" operators with "+
" in the formulas would restrict analysis to individual effects. If there are specific interactions that you think need to be included in the model based on your understanding of the subject matter, you can denote specific interactions with the ":
" operator.
Second, the warning from the second model suggests that some predictors can be expressed as linear combinations of the others in your data set. One place this might be happening is in the combination of diet
and starvation
: how do you code diet
in cases subjected to the starvation
treatment? The full data might suggest other sources of this problem
Third, I count 5 outcomes in your data; the associations among those outcomes might also be of interest. Your modeling approach doesn't seem to be taking this into account. As you seem to be in an academic or agricultural-research setting, there should be local statistical expertise available to help you with this. There's a limit to the type and amount of help that can be provided in a forum like this. Working directly with a statistician will be in the best long-term interest both of your project and of your starting to learn experimental design and statistical analysis.