1

Suppose we are dealing with a standard regression problem. To make things simple we have a single predictor variable $x$ and a response variable $y$, and we also know:

  1. The underlying real relationship between them is $y = x^2 + \eta$, with $\eta$ being noise.
  2. $y \ | \ x \ $ follows a gamma distribution.

As far as I understand we have, on the one hand, the function that models how $y$ varies with $x$, $E(y \ | \ x)$; and, on the other hand, how the uncertainty around each fixed $x_0$ behaves, which would correspond here to a gamma distribution.

Say for example that I wanted to model this using a Generalized Linear Model. Then the appropriate link would be $g^{-1}(x) = x^2$, and the distribution, of course, gamma.

Now, in order to train a neural network or do gradient boosting we try to minimize a loss function. How would I translate the above knowledge to a loss function? In other words, How does one choose the loss function assuming the knowledge above?

D1X
  • 733
  • 1
  • 5
  • 21
  • 1
    There's some more development of the link between likelihoods and cross-entropy here. https://stats.stackexchange.com/questions/378274/how-to-construct-a-cross-entropy-loss-for-general-regression-targets – Sycorax Aug 21 '20 at 13:41

1 Answers1

1

Generalized linear models are fitted using maximum likelihood, so instead minimizing loss, we maximize the likelihood. So instead of minimizing something like squared error, you need to maximize the likelihood, or minimize negative likelihood, i.e.

$$ \operatorname{arg\,min}_\theta -\frac{1}{N} \sum_{i=1}^N \,\log f(y|x, \theta) $$

where $f$ is the gamma probability density function in your case, and $\theta$ are the parameters.

Tim
  • 108,699
  • 20
  • 212
  • 390