Based on the response to this post, MLE is used instead of Least Squares for ARIMA models because the errors in the MA(q) part of the model are unobserved.
I don't get that: don't we have empirical values for the error terms: $e_1,e_2,....$ by using $Y_t - \hat{Y_t},Y_{t-1} - \hat{Y}_{t-1},....$ which can be updated recursively as the model is tuned?