1

When fitting a GAM using, say mgcv in R, one must specify the distribution and link function. For example, the default is Gaussian and identity, respectively. Now, I understand the link function (e.g., g in the first equation here) as the model definition explicitly includes it. I can point to it and say, "That's the link function". The distribution I find harder to understand.

The equation defining a GAM here suggests to me that the distribution describes the distribution of errors. I've seen that equation elsewhere, too (e.g.). Nevertheless, a CV answer posted here takes exception to that, clearly stating

You don't specify the "error" distribution, you specify the conditional distribution of the response.

Needless to say, I'm confused. So, my questions.

  1. What does the distribution actually refer to? Errors or conditional distribution of the response? (If the latter, what does the conditional part refer to, specifically?)
  2. How is the distribution used by a GAM? Is it an assumption that is used when fitting the GAM to data?

Any insights would be greatly appreciated.

Lyngbakr
  • 741
  • 1
  • 7
  • 16
  • Are you trying to suggest that specifying an error distribution (where "error" is the difference between the observed response and the true value) and specifying a response distribution are somehow different things? Although they might differ in trivial ways in their mathematical representations, they are identical conceptually and practically. And what exactly do you mean by "used by"? – whuber Sep 25 '17 at 14:39
  • By "used by", I mean if I were to write a package that fit GAMs (for example) what would I do with the information that the user specified Gaussian or Poisson? Where would that feature in my calculations? By error, I meant exactly as you say (i.e., difference between observed and true) whereas I thought of distribution of response as representing the values that a variable can take. For example, if I'm measuring the height of people, I wouldn't consider the distribution of heights as the error distribution. But I am not clear on the correct terminology. #ImNotAStatistician – Lyngbakr Sep 25 '17 at 14:44
  • When, for a given combination of regressor values, you subtract the height (as indicated by the model) from all the heights, you convert the response distribution into an error distribution. The error distribution usually has a more complicated representation. The only exceptions are where the response distribution is a member of a location family, such as the Normal distributions. – whuber Sep 25 '17 at 15:28

0 Answers0