3

If we have a regression function $R(X)$, whether it is linear or nonlinear, if we make a Gaussian assumption about the error term, optimizing square loss is equivalent to maximum likelihood estimation.

$$ Y = R(X) +\epsilon $$

If $R$ is a linear regression, the least squares parameter estimates are maximum likelihood estimates. If $R$ is a neural network, the least squares parameter estimates are maximum likelihood estimates.

What about if the regression function is a random forest? Is there a way to make maximum likelihood estimation make sense in this case?

What comes to mind is getting maximum likelihood estimates of the predictions, so maximum likelihood estimates of the $\hat Y$ vector instead of a parameter vector like in a linear regression.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • 4
    Would this be a fair paraphrasing: "Does a random forest model correspond to maximizing a specific likelihood function, and if so, what likelihood function is it maximizing?"? – Sycorax Dec 21 '21 at 18:35
  • @Sycorax Mostly, but I want some dependence on the loss function (chiefly square loss, but absolute loss is interesting for regression and log loss is interesting for classification). – Dave Dec 21 '21 at 18:41
  • I'm not sure I understand the distinction you're drawing. Minus the log likelihood (i.e., cross-entropy) is an equivalent **loss** function to maximizing the (log) likelihood. – Sycorax Dec 21 '21 at 18:55
  • The likelihood for square loss is different than the likelihood for absolute loss and log loss, but without parameters being estimated like in a linear regression, likelihood of what? – Dave Dec 21 '21 at 19:11
  • 1
    We seem to be talking past each other. If I write down a likelihood function, it's simple to write down a loss function that corresponds to that likelihood. See: https://stats.stackexchange.com/questions/378274/how-to-construct-a-cross-entropy-loss-for-general-regression-targets So it's hard to understand why the distinction that you make in your first comment is consequential, or to understand why "dependence on the loss function" is distinct from understanding what the likelihood is. – Sycorax Dec 21 '21 at 19:16

0 Answers0