When fitting a GAM using, say mgcv
in R, one must specify the distribution and link function. For example, the default is Gaussian and identity, respectively. Now, I understand the link function (e.g., g in the first equation here) as the model definition explicitly includes it. I can point to it and say, "That's the link function". The distribution I find harder to understand.
The equation defining a GAM here suggests to me that the distribution describes the distribution of errors. I've seen that equation elsewhere, too (e.g.). Nevertheless, a CV answer posted here takes exception to that, clearly stating
You don't specify the "error" distribution, you specify the conditional distribution of the response.
Needless to say, I'm confused. So, my questions.
- What does the distribution actually refer to? Errors or conditional distribution of the response? (If the latter, what does the conditional part refer to, specifically?)
- How is the distribution used by a GAM? Is it an assumption that is used when fitting the GAM to data?
Any insights would be greatly appreciated.