Have I used the glm poisson model correctly?

Question

I have a 3 way factorial experiment with the factors of watterlogging (waterlogged and non-waterlogged), cultivar(4 types) and phosphorus (P) rate (5 levels), and 4 reps. There was some damage due to herbicide spray, so I scored the damage on a 0-4 ranking system, 0 = no damage, 4 = >75% damage.

I want to see the effect my variables had on which plant received damage. As the herbicide damage seems to be count data, i fit a glm poisson model.

What is the best model to analyse the herbicide damage, or is the one I used OK?

structure(list(pot = c(1L, 2L, 3L, 4L, 21L, 22L, 23L, 24L, 5L, 
6L, 7L, 8L, 25L, 26L, 27L, 28L, 9L, 10L, 11L, 12L, 29L, 30L, 
31L, 32L, 13L, 14L, 15L, 16L, 33L, 34L, 35L, 36L, 17L, 18L, 19L, 
20L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 61L, 62L, 63L, 64L, 
45L, 46L, 47L, 48L, 65L, 66L, 67L, 68L, 49L, 50L, 51L, 52L, 69L, 
70L, 71L, 72L, 53L, 54L, 55L, 56L, 73L, 74L, 75L, 76L, 57L, 58L, 
59L, 60L, 77L, 78L, 79L, 80L), rep = c(1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), cultivar = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = 
c("Dinninup", 
"Riverina", "Seaton Park", "Yarloop"), class = "factor"), Waterlogging = 
structure(c(2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Non- 
waterlogged", "Waterlogged"), class = "factor"), P = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L), F = c(1.65, 0.61, 0.47, 0.57, 0.49, 0.71, 0.44, 
0.6, 0.52, 0.61, 0.48, 0.8, 0.79, 0.64, 0.44, 0.48, 0.69, 0.63, 
0.39, 0.68, 0.89, 0.66, 0.39, 0.52, 0.66, 0.51, 0.4, 0.55, 0.66, 
0.59, 0.54, 0.59, 0.45, 0.41, 0.47, 0.54, 0.5, 0.72, 0.39, 0.44, 
1.7, 1.78, 1.6, 2.34, 2.03, 1.97, 1.92, 1.88, 1.52, 1.88, 1.67, 
1.7, 1.97, 1.97, 1.88, 2.01, 1.88, 1.59, 1.97, 1.6, 2.08, 2.06, 
2.01, 2.22, 1.97, 2.13, 1.52, 2.5, 1.97, 1.74, 2.08, 1.3, 1.88, 
1.61, 1.61, 1.65, 2.05, 1.75, 1.61, 1.2), shoot.biomass = c(0.37, 
0.43, 0.52, 0.33, 1.24, 1.12, 1.28, 1.28, 0.48, 0.95, 1.35, 0.95, 
1.37, 1.4, 1.39, 1.34, 0.67, 0.8, 1.13, 0.32, 1.34, 1.53, 1.25, 
 1.4, 0.58, 0.98, 1.46, 0.69, 1.44, 1.83, 1.65, 1.71, 1, 1.17, 
1.45, 0.98, 1.52, 1.75, 1.63, 1.7, 0.25, 0.24, 0.91, 1.17, 1.23, 
1.22, 1.26, 0.89, 1.54, 1.01, 1.48, 0.93, 1.2, 1.55, 1.4, 1.19, 
0.9, 1.29, 1.43, 1.31, 1.75, 1.92, 1.63, 1.64, 1.31, 1.43, 1.77, 
1.28, 1.34, 1.54, 1.66, 1.88, 1.29, 1.05, 1.63, 1.36, 1.9, 2.18, 
 2.03, 1.68), spray.dmg = c(3L, 2L, 4L, 4L, 0L, 0L, 0L, 0L, 4L, 
4L, 3L, 4L, 0L, 0L, 0L, 0L, 4L, 4L, 3L, 4L, 0L, 0L, 0L, 0L, 4L, 
4L, 3L, 4L, 0L, 0L, 0L, 2L, 4L, 4L, 4L, 4L, 0L, 0L, 1L, 1L, 2L, 
3L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 3L, 0L, 0L, 0L, 0L, 3L, 
2L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 2L, 0L, 0L, 0L, 0L, 3L, 
4L, 2L, 2L, 0L, 0L, 0L, 0L)), row.names = c(NA, 80L), class = "data.frame")

Here is my model:

fm1 <- 
glm(spray.dmg~cultivar*as.factor(P)*Waterlogging+as.factor(rep),
family=poisson,data=iso)

anova(fm1,test='Chi')

and output:

P.S. how do I paste the dput into stack so the whole thing is indented by four rows?

The data you posted does not appear to contain the response variable `herb.dmg.11.DEC` — Robert Long, Jan 16 '20 at 09:13

Davide ND · Answer 1 · 2020-01-16T11:11:02.837

1

While the intuition behind it is not bad, I would not suggest using a Poisson glm for this problem, as it would have a different domain than your problem. You could find you glm predicting a value of 6 (which for you doesn't mean anything), and this can also wrongly affect the computation of the loss.

I would instead suggest using a ordinal logistic (or probit) regression, that you can find in R with the function polr from package MASS. You can find some documentation here. This would allow you to limit the number of classes (0-4 like you had before), maintain the ordinal notion of the classes, and predict the values as class probabilities, which is handier.

EDIT: since the docu of polr looked a bit complicated, there are other packages doing the same thing that might be more intuitive.
HERE is a very nice link to techniques to do ordinal regression, with multiple packages and examples. Good luck :)

edited Jan 16 '20 at 11:11

answered Jan 16 '20 at 09:09

Davide ND

2,305
8
24

Sounds good. Though the application seems beyond a novice like me. house.plr – Eliott Reed Jan 16 '20 at 09:52
I would use the same formula as you used before - just with the response as a factor: `a < -polr(as.factor(spray.dmg)~cultivar*as.factor(P)*Waterlogging+as.factor(rep), method="logistic",data=iso) ` and `summary(a)` – Davide ND Jan 16 '20 at 10:41
Technically you're still using a GLM, as your model will be linear in the variables and you will have a link function, only difference is that you will be modeling log odds, and this should translate better to your problem – Davide ND Jan 16 '20 at 10:43
How do we interpret the summary output? What is the essential information? – Eliott Reed Jan 16 '20 at 23:06
Look at this post that has some nice explanations https://stats.stackexchange.com/questions/7720/how-to-understand-output-from-rs-polr-function-ordered-logistic-regression – Davide ND Jan 17 '20 at 08:29

Have I used the glm poisson model correctly?

1 Answers1