In regression, it is abundantly clear that $Y$ can be non-normal while the residuals $\epsilon = Y - \beta_0 - \beta_1 X$ are normal. But can $Y$ be binary when the $\epsilon$ are normally distributed? This question is motivated by students' regression projects where their $Y$ was quite discrete, and obviously non-normal for that reason, yet their residual histogram and q-q plots "looked normal." Here is a simulation to illustrate:
par(mfrow=c(1,2)); set.seed(12345)
X = rnorm(1000,3,1); Y = round(X + rnorm(1000,0,1))
table(Y) # Y is highly discrete and obviously non-normal
model = lm(Y ~ X) ## But the diagnostic plots and tests suggest normality is reasonable
qqnorm(model$residuals, main= "Residual q-q Plot"); qqline(model$residuals)
shapiro.test(model$residuals); plot(X,Y, main = "Raw Scatterplot")
Despite the large sample size ($n=1000$), the Shapiro test "accepts" normality ($p = 0.1228$), and the normal q-q plot looks fine. However, the raw scatterplot shows obviousness discreteness, and hence, obvious nonnormality that is not discernable from the analysis of the residual distribution.
I am pretty sure that the actual distribution of $\epsilon$ is not precisely normal in my example, but could it be normal?
How bad can this problem be? In the most extreme case, suppose $Y$ is binary (0 or 1), and a linear predictor is $\beta_0 + \beta_1 X$. Is it possible that $\epsilon = Y - \beta_0 - \beta_1 X$ has precisely a normal distribution in this case?