I know this is a basic question but I really want to make sure I fully understand the reason's why this is the case.
If possible, can someone help explain to me as simply as possible why it is bad to use count or binary variables as a dependent variable in OLS regression?
Here is my current understanding:
An ordinary least squares model, also called linear regression, measures the relationship of one or more independent variables (predictors) on a dependent variable. A linear regression is meant to give an expectation of the dependent variable when the independent variables are known. In this model, we are assuming outcomes are continuous and their errors are normally distributed around the expected value of the dependent variable. When we use this model on a binary variable that takes values of either 0 or 1, we will create a line that goes through the points 0, and 1, however, the line will extend below 0 and above 1. Thus, our independent variables will be able to predict expected values that are impossible – either below 0 or 1. Additionally, if we think about a linear regression model that travels through the boundary points, 0 and 1, we would see that the expected values must more frequently take the value 0 or 1 as the line approaches either boundary. Therefore, the variance of the expected value must decrease to 0. In this case, our linear regression will misinterpret the weights, underweighting data where the expectation is near 0 or 1. Additionally, in this case, we will find that our error term is not normally distributed. Our error may only take on values that bring us back to 0 or 1. These problems will persist similarly to a count data set.
Specifically, I don't really understand why this applies to a count data set. I also am unsure if the error term will only take two values in the case of count or binary dependent variables if someone could please try to explain to me as simply as possible I would appreciate it.