Should the residuals of a machine learning regression model be i.i.d.?

Question

This is a basic question but I did not find the answer in most common statistical learning books.

In linear regression we assume that the residuals are i.i.d. Do we assume the same for a regression made by a ML algorithm?

I use in particular random forest. My residuals show spatial autocorrelation. It is not a problem itself because I can make some diagnostics and account for it. But more generally, I want to know whether this is harmful for the random forest model since it violates the i.i.d assumption of the residuals.

It relates this this question but only for the residuals.

Cagdas Ozgenc · Accepted Answer · 2019-11-19T14:27:49.210

First of all in linear regression we don't assume that residuals are IID. The assumption is that errors are IID (or at least spherical). In general when fit by OLS and errors are normally distributed, the residuals are distributed as:

$$r = (I-H)\epsilon\sim N(0,\sigma^2(I-H))$$

where $H$ is the hat matrix (https://en.wikipedia.org/wiki/Projection_matrix). As can be seen the covariance matrix is not diagonal (dependent), nor the variances are equal (not identical).

In general residuals will very unlikely to be independent as each residual is a function of the same training data which by default creates a dependency. Nor they will be identically distributed due to their position in the input space.

Having said that there is no requirement that errors must be IID in any kind of fitting algorithm. It has advantages though. Firstly, you can derive some conclusions about your estimates more easily if you know the distribution of errors. Secondly dependent errors is an indication that you haven't consumed all the available information in the data set, which means you could actually have done better with a different model. Nevertheless these are all presumptions and must be checked by examining the residuals as errors are not accessible.

Should the residuals of a machine learning regression model be i.i.d.?

1 Answers1