How to interpret parameter estimates in Poisson GLM results

Question

Call:
glm(formula = darters ~ river + pH + temp, family = poisson, data = darterData)

Deviance Residuals:
    Min      1Q   Median     3Q    Max
-3.7422 -1.0257   0.0027 0.7169 3.5347

Coefficients:
              Estimate Std.Error z value Pr(>|z|)
(Intercept)   3.144257  0.218646  14.381  < 2e-16 ***
riverWatauga -0.049016  0.051548  -0.951  0.34166
pH            0.086460  0.029821   2.899  0.00374 **
temp         -0.059667  0.009149  -6.522  6.95e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)
Null deviance: 233.68 on 99 degrees of freedom
Residual deviance: 187.74 on 96 degrees of freedom
AIC: 648.21

I want to know how to interpret each parameter estimate in the table above.

The interpretation is identical: http://stats.stackexchange.com/a/126225/7071 — dimitriy, Dec 13 '14 at 00:31
This question appears to be off-topic because it is about explaining an R output without any form of intelligent question behind. This is the category "I dump my computer output there and you run the stat analysis for me"... — Xi'an, Dec 13 '14 at 07:19
Your dispersion parameter seems to indicate that there are some problems with your model. Perhaps you should consider using a quasipoisson distribution instead. I bet your parameter estimates will change drastically and so will the interpretation. If you run "plot(model)" you will get some plots of your residuals, have a look at these plots for unwanted patterns before you start interpreting your actual model. For quickly plotting the fit of your model you can also use "visreg(modelfit)" from the visreg package — Robbie, Dec 13 '14 at 08:56
@Xi'an, although the question is sparse & required editing, I don't think it is off-topic. Consider these questions that are not considered off-topic: [Interpretation of R's lm() output](http://stats.stackexchange.com/q/5135/), & [Interpretation of R's output for binomial regression](http://stats.stackexchange.com/q/86351/7290). It does appear to be a [duplicate](http://stats.stackexchange.com/q/11096/), however. — gung - Reinstate Monica, Dec 13 '14 at 16:03
This is a duplicate of [How to interpret coefficients in a Poisson regression?](http://stats.stackexchange.com/q/11096/) Please read the linked thread. If you still have a question after reading that, come back here & edit your question to state what you have learned & what you still need to know, then we can provide the information you need without simply duplicating material elsewhere that already didn't help you. — gung - Reinstate Monica, Dec 13 '14 at 16:04

shadowtalker · Accepted Answer · 2020-11-22T11:24:26.290

I don't think the title of your question accurately captures what you're asking for.

The question of how to interpret the parameters in a GLM is very broad because the GLM is a very broad class of models. Recall that a GLM models a response variable $y$ that is assumed to follow a known distribution from the exponential family, and that we have chosen an invertible function $g$ such that $$ \mathrm{E}\left[y\,|\,x\right] = g^{-1}{\left(x_0 + x_1\beta_1 + \dots + x_J\beta_J\right)} $$ for $J$ predictor variables $x$. In this model, the interpretation of any particular parameter $\beta_j$ is the rate of change of $g(y)$ with respect to $x_j$. Define $\mu \equiv \mathrm{E}{\left[y\,|\,x\right]} = g^{-1}{\left(x\right)}$ and $\eta \equiv x \cdot \beta$ to keep the notation clean. Then, for any $j \in \{1,\dots,J\}$, $$ \beta_j = \frac{\partial\,\eta}{\partial\,x_j} = \frac{\partial\,g(\mu)}{\partial\,x_j} \text{.} $$ Now define $\mathfrak{e}_j$ to be a vector of $J-1$ zeroes and a single $1$ in the $j$th position, so that for example if $J=5$ then $\mathfrak{e}_3 = \left(0,0,1,0,0\right)$. Then $$ \beta_j = g{\left(\mathrm{E}{\left[y\,|\,x + \mathfrak{e}_j \right]}\right)} - g{\left(\mathrm{E}{\left[y\,|\,x\right]}\right)} $$

Which just means that $\beta_j$ is the effect on $\eta$ of a unit increase in $x_j$.

You can also state the relationship in this way: $$ \frac{\operatorname{\partial}\mathrm{E}{\left[y\,|\,x\right]}}{\operatorname{\partial}x_j} = \frac{\operatorname{\partial}\mu}{\operatorname{\partial}x_j} = \frac{\operatorname{d}\mu}{\operatorname{d}\eta}\frac{\operatorname{\partial}\eta}{\operatorname{\partial}x_j} = \frac{\operatorname{\partial}\mu}{\operatorname{\partial}\eta} \beta_j = \frac{\operatorname{d}g^{-1}}{\operatorname{d}\eta} \beta_j $$ and $$ \mathrm{E}{\left[y\,|\,x + \mathfrak{e}_j \right]} - \mathrm{E}{\left[y\,|\,x\right]} \equiv \operatorname{\Delta_j} \hat y = g^{-1}{\left( \left(x + \mathfrak{e}_j\right)\beta \right)} - g^{-1}{\left( x\,\beta \right)} $$

Without knowing anything about $g$, that's as far as we can get. $\beta_j$ is the effect on $\eta$, on the transformed conditional mean of $y$, of a unit increase in $x_j$, and the effect on the conditional mean of $y$ of a unit increase in $x_j$ is $g^{-1}{\left(\beta\right)}$.

But you seem to be asking specifically about Poisson regression using R's default link function, which in this case is the natural logarithm. If that's the case, you're asking about a specific kind of GLM in which $y \sim \mathrm{Poisson}{\left(\lambda\right)}$ and $g = \ln$. Then we can get some traction with regard to a specific interpretation.

From what I said above, we know that $\frac{\operatorname{\partial}\mu}{\operatorname{\partial}x_j} = \frac{\operatorname{d}g^{-1}}{\operatorname{d}\eta} \beta_j$. And since we know $g(\mu) = \ln(\mu)$, we also know that $g^{-1}(\eta) = e^\eta$. We also happen to know that $\frac{\operatorname{d}e^\eta}{\operatorname{d}\eta} = e^\eta$, so we can say that $$ \frac{\operatorname{\partial}\mu}{\operatorname{\partial}x_j} = \frac{\operatorname{\partial}\mathrm{E}{\left[y\,|\,x\right]}}{\operatorname{\partial}x_j} = e^{x_0 + x_1\beta_1 + \dots + x_J\beta_J}\beta_j $$

which finally means something tangible:

Given a very small change in $x_j$, the fitted $\hat y$ changes by $\hat y\,\beta_j$.

Note: this approximation can actually work for changes as large as 0.2, depending on how much precision you need.

And using the more familiar unit change interpretation, we have: \begin{align} \operatorname{\Delta_j} \hat y &= e^{ x_0 + x_1\beta_1 + \dots + \left(x_j + 1\right)\,\beta_j + \dots + x_J\beta_J } - e^{x_0 + x_1\beta_1 + \dots + x_J\beta_J} \\ &= e^{ x_0 + x_1\beta_1 + \dots + x_J\beta_J + \beta_j} - e^{x_0 + x_1\beta_1 + \dots + x_J\beta_J} \\ &= e^{ x_0 + x_1\beta_1 + \dots + x_J\beta_J}e^{\beta_j} - e^{x_0 + x_1\beta_1 + \dots + x_J\beta_J} \\ &= e^{ x_0 + x_1\beta_1 + \dots + x_J\beta_J} \left( e^{\beta_j} - 1 \right) \end{align} which means

Given a unit change in $x_j$, the fitted $\hat y$ changes by $\hat y \left( e^{\beta_j} - 1 \right)$.

There are three important pieces to note here:

The effect of a change in the predictors depends on the level of the response.
An additive change in the predictors has a multiplicative effect on the response.
You can't interpret the coefficients just by reading them (unless you can compute arbitrary exponentials in your head).

So in your example, the effect of increasing pH by 1 is to increase $\ln \hat y$ by $\hat y \left( e^{0.09} - 1 \right)$; that is, to multiply $\hat y$ by $e^{0.09} \approx 1.09$. It looks like your outcome is the number of darters you observe in some fixed unit of time (say, a week). So if you're observing 100 darters a week at a pH of 6.7, raising the pH of the river to 7.7 means you can now expect to see 109 darters a week.

I made a couple tweaks here, @ssdecontrol. I think they'll make your post a little easier to follow, but if you don't like them, roll them back with my apologies. — gung - Reinstate Monica, Dec 13 '14 at 15:50
I you can't figure that out from my answer then clearly I need to revise the answer. What are you still confused about? — shadowtalker, Dec 14 '14 at 04:58
Plug those numbers into the equation just like in linear regression — shadowtalker, Dec 14 '14 at 08:31
Hello @ssdecontrol. When you say E[y|x] do you mean E[y|xj]? And, I would like to use your formulas to interpret regressions with interactions, such as this VV =A+B·X+C·Y+D·Z+E·X·Y+F·X^2+G·Y·Z. Should I use partial derivatives ∂g(μ)/∂xj or total derivatives dg(μ)/dxj?, Why? — skan, Aug 24 '16 at 01:04
When we have an interaction term many people (and books) just say a given coefficient is the y variation when the other variables are zero, but sometimes that's not true — skan, Aug 24 '16 at 01:43
@skan no, I mean $E[y|x]$. $x$ and $y$ are random variables representing to a single observation. $x$ is a vector indexed by $j$; $x_j$ is the random variable representing a specific feature/regressor/input/predictor for that observation. — shadowtalker, Aug 24 '16 at 05:52
And don't overthink it. Once you understand all the pieces in a GLM, the manipulations here are just a direct application of calculus principles. It really is as simple as taking the derivative with respect to the variable you're interested in. — shadowtalker, Aug 24 '16 at 05:55
Sorry to revive this after so long, but I wonder: what if he/she is observing 0 darters at a pH of 6.7? Then the expected count is 0 for any pH and regardless of the coefficient estimate? (After all, 0 × anything = 0) — jocateme, Sep 12 '20 at 18:36
@jocateme nope. The expected count is a function of the linear component of the model, which is always *additive*, for example $y = \exp(x_0 + x_1 \beta_1 + x_2 \beta_2)$. In this case if $x_2$ is 0 then it doesn't force $y$ to be 0. You might be confused because this expands to $y = \exp(x_0) \exp(x_1 \beta_1) \exp(x_2 \beta_2)$. But $\exp(0)$ is 1, not 0. — shadowtalker, Sep 13 '20 at 00:52
Thanks, @shadowtalker! I was thinking more about the part where you multiply $\hat{y}$ by $e^{0.09}$. In your example, assuming $\hat{y} = 100$, you would expect $\hat{y} = 100 × 1.09 = 109$ with a one-unit increase in pH. But if $\hat{y} = 0$, $\hat{y} = 0 × 1.09 = 0$ with a one-unit increase in pH. (This is where I'm coming from, by the way: https://stats.stackexchange.com/questions/487146/choosing-reasonable-priors-for-poisson-glmm) — jocateme, Sep 13 '20 at 16:23
@jocateme You are correct, but it can never happen. In this model, $\hat y$ can never be 0, no matter what the values of $x$ and $\beta$ are. An exponential function can never produce a value of exactly 0, nor can a logarithm (being the inverse of the exponential) accept a value of exactly 0. If 0 is a plausible outcome in your problem, you should not use the logarithmic link function. Note also that the "multiplicative" interpretation of the model specifically depends on using the logarithmic link function. — shadowtalker, Sep 14 '20 at 19:48
@shadowtalker, that makes total sense! The outcome can only approximate 0 but never reach it. Thanks a lot! — jocateme, Sep 16 '20 at 00:50
Last paragraph: "the effect of increasing pH by $1$ is to increase $ln\hat{y}$ by $\hat{y}(e^{0.09}-1)$" shouldn't this simply be "... increase $ln\hat{y}$ by $0.09$" since the change is additive on the log scale? — user133631, Jan 14 '21 at 16:18

score 4 · Answer 2 · answered Dec 13 '14 at 03:40

4

My suggestion would be to create a small grid consisting of combinations of the two rivers and two or three values of each of the covariates, then use the predict function with your grid as newdata. Then graph the results. It is much clearer to look at the values that the model actually predicts. You may or may not want to back-transform the predictions to the original scale of measurement (type = "response").

answered Dec 13 '14 at 03:40

Russ Lenth

15,161
20
53

1

As much as I like this approach (I do it all the time) I think it's counterproductive for building understanding. – shadowtalker Dec 13 '14 at 03:45
1

I found it disconcerting to see advice to look at predictions being criticized as counterproductive for understanding. – DWin Oct 10 '20 at 20:10
1

Yes, I guess this was 6 years ago, but how can his be counterproductive; and why would a person who thinks so do it all the time? Maybe @shadowtalker would like to explain, even after all this time? – Russ Lenth Oct 10 '20 at 20:16

How to interpret parameter estimates in Poisson GLM results

2 Answers2

Linked