On page 4 of https://www.sagepub.com/sites/default/files/upm-binaries/21121_Chapter_15.pdf, the authors state the following strength of generalized models, which I don't quite understand.
Indeed, one of the strengths of the GLM paradigm - in contrast to transformations of the response variable in linear regression - is that the choice of linearizing transformation is partly separated from the distribution of the response, and the same transformation does not have to both normalize the distribution of Y and make its regression on the Xs linear. The specific links that may be used vary from one family to another and also—to a certain extent—from one software implementation of GLMs to another. For example, it would not be promising to use the identity, log, inverse, inverse-square, or square-root links with binomial data, nor would it be sensible to use the logit, probit, log-log, or complementary log-log link with nonbinomial data.
I understand the transformation that makes the regression linear is the link function. But what do they mean by the transformation that normalizes the distribution of Y?
What would the distributions look like if the transformations had to be the same?
How do the examples justify the property stated? Seems like they are talking about examples where it is not advisable to use arbitrary link functions with a given distribution, but the strength they claim is that you can use arbitrary functions.