If we have a regression function $R(X)$, whether it is linear or nonlinear, if we make a Gaussian assumption about the error term, optimizing square loss is equivalent to maximum likelihood estimation.
$$ Y = R(X) +\epsilon $$
If $R$ is a linear regression, the least squares parameter estimates are maximum likelihood estimates. If $R$ is a neural network, the least squares parameter estimates are maximum likelihood estimates.
What about if the regression function is a random forest? Is there a way to make maximum likelihood estimation make sense in this case?
What comes to mind is getting maximum likelihood estimates of the predictions, so maximum likelihood estimates of the $\hat Y$ vector instead of a parameter vector like in a linear regression.