In short, I'm curious about the problems associated with a difference in the sample size of respondents by a binary variable when fitting a logit model focused on prediction rather than causality.
By difference in sample size, I mean that 80% of respondents have been surveyed as 0 in a binary variable, while 20% of respondents are 1. In absolute numbers, let's say that 1600 respondents are 0, while 400 are 1.
I understand the difference in sample size may be representative of the population, but does it cause any problems in the logit model? I have read that it could reduce sensitivity.
What theorems, functions, assumptions, etc. should I look into or use?
For reference, I'm working in R in case that helps in providing an example as an answer.
Thank you for your help.