Suppose we are dealing with a standard regression problem. To make things simple we have a single predictor variable $x$ and a response variable $y$, and we also know:
- The underlying real relationship between them is $y = x^2 + \eta$, with $\eta$ being noise.
- $y \ | \ x \ $ follows a gamma distribution.
As far as I understand we have, on the one hand, the function that models how $y$ varies with $x$, $E(y \ | \ x)$; and, on the other hand, how the uncertainty around each fixed $x_0$ behaves, which would correspond here to a gamma distribution.
Say for example that I wanted to model this using a Generalized Linear Model. Then the appropriate link would be $g^{-1}(x) = x^2$, and the distribution, of course, gamma.
Now, in order to train a neural network or do gradient boosting we try to minimize a loss function. How would I translate the above knowledge to a loss function? In other words, How does one choose the loss function assuming the knowledge above?