1

In Poisson GLM, the response variable $Y$ follows the Poisson distribution

$$P(Y=y)=\lambda^y\exp(-\lambda)/y!$$

and:

$$\lambda=\exp(\bf \theta^Tx)$$

My question is why do we use exponential as the nonlinearity function here (or why choose $\log()$ as the link function)? Why not use any other positive, monotonically increasing function?

Cloudy
  • 181
  • 4
  • 1
    It **is** possible to use other link functions, for a discussion see https://stats.stackexchange.com/questions/203355/pros-and-cons-of-log-link-versus-identity-link-for-poisson-regression. But the log link function is most often used, one reason is that counts is an **extensive variable**, see https://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353 for an explication. – kjetil b halvorsen Jul 16 '20 at 02:37

2 Answers2

1

As mentioned in a comment, you don't need to use a log link: (R functions)

glm(dat ~ x, family = poisson(link = "log") 

glm(dat ~ x, family = poisson(link = "sqrt")  

glm(dat ~ x, family = poisson(link = "identity"))

The log (or exponential) link is what is called the canonical link function for the Poisson glm. It has nice properties and is derived as a side effect of writing the Poisson distribution as an exponential family. But if you prefer a different link, you can use a different one.

AlaskaRon
  • 2,219
  • 8
  • 12
0

My question is why do we use exponential as the nonlinearity function here (or why choose $\log()$ as the link function)?

As AlaskaRon states, it is because it is the canonical form. This has many desirable properties. In general, we can write the density, $f$, of the exponential family as

$$\log f(y;\theta,\tau) = \log h(y, \tau) + b(\theta)T(y) - A(\theta) - d(\tau)$$

using the same notation as wiki where $\tau$ is the dispersion parameter and $\theta$ is related to the mean. We are working with a canonical form when $b$ is the identity function

$$\log f(y;\theta,\tau) = \log h(y, \tau) + \theta T(y) - A(\theta) - d(\tau).$$

This is the case with the Poisson model with the log link. You can see this by setting $\eta = \log \theta = \log \lambda$ and $\tau = 1$ with $d(\tau) = 1$ as

$$\begin{align*} \log P(Y = y) &= y\log \lambda - \lambda -\log y! \\ &= \underbrace{-\log y!}_{\log h(y,\tau)} + y\eta-\underbrace{\exp\eta}_{A(\eta)} \end{align*}$$

One advantage with the canonical form is that the mean is

$$\text{E}(y; \theta,\tau) = A'(\theta)$$

and the variance is

$$\text{Var}(y; \theta,\tau) = A''(\theta)d(\tau)\geq0$$

Given that $d(\tau) > 0$ this implies that $\partial^2/\partial\theta^2\, \log f(y;\theta,\tau) \leq 0$ so the density is concave in $\theta$ which is nice for maximum likelihood estimation. There are further nice properties in terms of the moment generating function, that $T(y)$ is also the identity for many distributions, and more properties which are useful in a variety of applications.

See also this question, this question, and this answer.

Why not use any other positive, monotonically increasing function?

You can use another link function. You will not have all the nice properties which you get with the canonical form but the link function you choose may be a better approximation of the data generating process.