Previous reading
Let me first say that I went through this post: (How to determine which distribution fits my data best?) and this post: Assumptions of linear models and what to do if the residuals are not normally distributed, before posting this question. I tried to put everything in subsections to keep the question as clear as possible.
Question:
Can I defend using ols, when my dependent variable is ordinal, if I satisfy all CLM assumptions?
Situation
Sample size: n=23,000
I have a technically ordinal dependent variable (range: 0=no obstacle, 3=severe obstacle) which is distributed as follows:
Because I need the residuals from this regression (explanation why I need the residuals), I would like to NOT treat it as ordinal. This post is essentially about whether I can or not.
My understanding
Now, if I understand correctly, the main reason why I could perhaps not use OLS, would be because the errors/residuals, are not normally distributed:
One of the assumptions of the classical linear model assumptions (CLM), is normality. More specifically, "the population error is independent of the explanatory variables $x_1, x_2, ..., x_k$ and is normally distributed with zero mean and variance $ \sigma: u \sim Normal(0,\sigma^2)$.
So my thought was that, if my residuals are normally distributed, I could defend treating my dependent variable as continuous (please comment).
Alex however additionally mentions the following:
Now, I have to say that my understanding of this requirement is a little bit different (but please correct me if I am wrong). The actual assumption for Multiple Linear Regression, is that the population model is linear in parameters. See also this explanation:
All in all, it appears to me that I can still use OLS to estimate my model.
Nevertheless I am curious, would there be any benefits from choosing for example a quasipoisson model?
What I checked
The first thing I did is to check my residuals:
library(fitdistrplus)
library(logspline)
descdist(x, discrete = TRUE)
summary statistics
------
min: -2.629229 max: 3.123659
median: -0.164249
mean: -0.000000000000000037898
estimated sd: 0.9253059
estimated skewness: 0.5777857
estimated kurtosis: 2.919639
fit <- fitdist(x, "norm")
plot(fit)
Returning to the question
If I am not violating any (CLM) assumptions, can I defend using OLS to estimate my model?
If I can defend this, would there still be anything to gain from using any other model (for example, a quasi poisson) and why then would that be?