Why is the link function a function of the mean and not the linear predictor?

Question

Generalized linear models are formulated so that the link function $g$ of the mean $\mu$ of a random variable $Y$ is equal to the linear predictor $\eta=x'\beta$, i.e. \begin{align} g(\mu)=\eta. \end{align} My question is, why don't we call $h=g^{-1}$ the link function and write $\mu=h(\eta)$. That way, if $h$ is not one-to-one, there is still a distribution on the response variables but the $\beta$ are not identifiable. Isn't it problematic that for certain $g$, there is no well-defined model? It also is more intuitive to think of the mean as a function of the parameters, rather than the reverse. Is there a conceptual advantage to defining $g$ in this way? Is it mathematically necessary for some reason I'm not seeing?

Because someone has written it like this and we use it as so for historical reasons... moreover different people find different things to be "intuitive". — Tim, Apr 06 '17 at 18:02
@Tim So it is just historical. Hmm, it just seems a little strange to me — , Apr 06 '17 at 18:14
http://stats.stackexchange.com/questions/41306/why-are-probability-distributions-denoted-with-a-tilde/41332 — kjetil b halvorsen, Apr 07 '17 at 11:00

Tim · Accepted Answer · 2017-04-07T10:49:28.423

1

First of all, the notation is not used consistently. For example, Nelder and Wedderburn (1972) write about linear predictor $Y$ and a linking function "$\theta = f(Y)$ connecting the parameter $\theta$ of the distribution of $z$ with the $Y$'s of the linear model" (p. 372). On another hand, McCullaugh and Nelder (1983) use the link function $\eta_i = g(\mu_i)$ (p. 27). So both notations were used in the classical literature on GLM's.

Basically it doesn't matter if you use $f = g^{-1}$ since what you need is a one-to-one mapping that works in both direction (through it's inverse). All the common link functions have this property. Nobody said that $g$ can be any function...

As about "intuitiveness", notice that if you waned to use linear regression for a count data, then you would usually transform the outcome using a log transformation and then fit a linear regression to it, so in some cases transforming the outcome is also "intuitive".

Moreover, I don't find anything more intuitive in considering transformed mean as a function of linear predictor vs mean as a function of transformed linear predictor, they are the same.

Nelder, J. and Wedderburn, R. (1972). Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General). Blackwell Publishing. 135 (3): 370–384.

McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC.

edited Apr 07 '17 at 10:49

answered Apr 07 '17 at 10:41

Tim

108,699
20
212
390

Thanks for your detailed response. It does seem sort of trivial and maybe "intuitive" was the wrong word. But to me, it seems like if you define it as $g(\mu)=\eta$ then you should mention that $g$ has to be 1-1. For example, wikipedia's page on GLMs makes no mention of this, although they do imply it by using $g^{-1}$. It's a minor point, but I still feel it's slightly confusing to an unfamiliar person looking for a concise description of what is allowed by "generalized linear model." Moreover, it seems like by defining it as $\mu=h(\eta)$ you could allow for $h$ that are not 1-1... – Apr 07 '17 at 17:15
...which might have some use in the way that linear models are often not identifiable without constraints, (but I don't know if that would actually be useful because I don't know anything about GLMs). – Apr 07 '17 at 17:17
@51413 well, it needs to be a function that makes sense in this context. E.g. neither of the sources says that it cannot be a constant function, but obviously using constant function as a link function would be absolutely pointless... – Tim Apr 07 '17 at 18:05
Fair enough. But even though it's pointless, maybe there's something elegant about being *able* to use a constant link function. Eg, if $\mu=h(\eta)$ is constant, then you still have response distribution for the data, but if $\eta=g(\mu)$ is constant, then there is no well-defined response distribution – Apr 07 '17 at 18:31

Why is the link function a function of the mean and not the linear predictor?

1 Answers1