Is Poisson regression a good fit for this dataset?

Question

I am using a hurricane dataset (specifically the NDAM and Gender_MF columns):

set.seed(100)
library(ggplot2)
library(rio)
url = "https://www.pnas.org/highwire/filestream/616321/field_highwire_adjunct_files/0/pnas.1402786111.sd01.xlsx"
data = rio::import(url)
data = data[1:92,]
ggplot(data = data, mapping = aes(x = NDAM, fill = factor(Gender_MF), 
       color = factor(Gender_MF))) + geom_density(alpha = 1/20, adjust = 1/2)

Both distributions are skewed and need transformation.

My aim is to fit a model and see whether Gender_MF explains the hurricane damage. So, I consider the NDAM as a count and fitted a Poisson regression as follows.

pois.reg <- glm(NDAM ~ factor(Gender_MF), family = poisson, data = data)

The summary output

Does this Poisson regression model have a good fit for these data? How can I interpret the coefficients? Can I say Gender_MF explains the hurricane damage?

What is "NDAM"? More generally, what are these data? What paper are they from? I'm pretty sure they simply alternate between male & female names in giving names to storms. Is your hypothesis that stronger & weaker storms alternate? — gung - Reinstate Monica, Jul 19 '19 at 19:03
@gung the data was taken from https://www.pnas.org/content/early/2014/05/29/1402786111/tab-figures-data and NDAM is the normalized damage of the hurricanes. I was trying to explain if Gender_MF explains the damage without including other explanatory variables. — Matthew, Jul 19 '19 at 19:08
The links I left in your last question about this paper have re-analysis of this dataset that go into a great deal of detail and come to firmer conclusions than we can based on this output alone. — mkt, Jul 19 '19 at 19:10
@mkt I read all the criticisms on the paper. My aim, as a beginner, is to learn about selecting and fitting appropriate regression models. I started by fitting a Poisson regression to see if gender explains the hurricane damage. I also tried gam() and glm() but I thought Poisson regression is better. — Matthew, Jul 19 '19 at 19:28
Note that Joseph Hilbe is the statistician on the paper. They used negative binomial regression, not Poisson, AFAICT. I can't seem to find a definition of *normalized* in "normalized damage"; I'm not sure what that means, but they do seem to clearly state that these are counts. — gung - Reinstate Monica, Jul 19 '19 at 19:37
@gung Right. I tried that too library(MASS) h.nb = glm.nb(NDAM ~ factor(Gender_MF), data = data) and it gave me approximately the same result as the above summary. The difference is in Std. Error, z value and Pr(>|z|). I have this about NDAM: the normalized damage caused by the hurricane, adjusted for inflation, wealth, and population — Matthew, Jul 19 '19 at 19:44
@Matthew It's definitely a good idea to use datasets to teach yourself new methods. But the problem is that your question "Can I say Gender_MF explains the hurricane damage?" is one that goes beyond statistical procedure alone and into interpretation. And we cannot answer it well based on your regression alone - the criticism of the published data & analyses form important context for interpreting your results. — mkt, Jul 20 '19 at 14:52
@mkt Right. I am trying to read all the criticisms in detail (from the links you sent me and others) and trying to do that. Thanks again. — Matthew, Jul 20 '19 at 15:44

mkt · Accepted Answer · 2019-07-22T10:59:37.777

I'm ignoring external context about this paper and analysis for the purposes of this answer.

1. Does this Poisson regression model have a good fit for these data?

We have no way to judge that from the output you have presented.

2. How can I interpret the coefficients?

I don't know which genders 0 and 1 represent. But the output means that

Gender_MF = 0 has an expected NDAM of exp(8.936) = 7600

Gender_MF = 1 has an expected NDAM of exp(8.936 - 0.068) = 7100

So Gender_MF = 1 is associated with a 500 unit decrease in NDAM relative to Gender_MF = 0.

Could be worth your time to read How to interpret coefficients in a Poisson regression?

3. Can I say Gender_MF explains the hurricane damage?

I would say instead that Gender_MF is associated with hurricane damage in this dataset, conditional on a set of assumptions that we cannot evaluate from the model output alone. 'Explains' is a bit ambiguous but hints at a causal claim, and I would be very wary of making causal claims based on this alone.

Is Poisson regression a good fit for this dataset?

1 Answers1