I use an example to illustrate my question.
I have a model that explains choice of low fat vs full fat milk, that was actually bought in a store. We model it with a binary logistic regression.
The model parameters mostly stem from a questionnaire, that a lot of low-and high fat milk customers filled out. However, we also used their ZIP codes, to see if they live in a rural area or not, and if cows are held in their ZIP code (those 2 variables have a correlation of .5).
For rural areas we use ZIP code density as a proxy and group accordingly. For the cows we use the number of cows per 100 inhabitants "Cowsper100".
We argue the more rural, the more high fat milk, as processed food is less popular in rural areas and more cows per inhabitants also lead to more interest in high fat milk. (This is a mock example, so yeah, I am not sure how convinced you are, but assume you were convinced.)
For simplicity of this question assume we only look at the following model:
High Fat Milk Purchase (Yes/No) = b0 + b1*RuralArea + b2*Cowsper100 + b3*SurveyCovariate + error
One of the reviewer encourages us to use a multi-level model. However we are insecure, because we have very few people per ZIP code, and many ZIP codes. Following this question's top answer, we might not need it, right? OLS with clustered standard errors vs. multilevel modeling when the main interest is at the individual level
In all areas you can purchase both high and low fat milk. (People that purchase both are counted for only one group, according to a rule that makes more sense in non-milky context.)
What is the general rule: When do you need a multi-level model? Is there anyone who could help me, by pointing to the relevant literature?