Starting from Poisson distribution and cluster analysis
I am trying to find a statistical/empirical method in order to test if my Index of Dispersion (https://en.wikipedia.org/wiki/Index_of_dispersion) is significant or not. I am not a statistician so please forgive any possible mistake I made on the calculation.
Here my data:
df = read.table(text = 'Year Count
1975 10
1976 12
1977 9
1978 14
1979 14
1980 11
1981 8
1982 7
1983 10
1984 8
1985 12
1986 9
1987 10
1988 9
1989 10
1990 9
1991 11
1992 12
1993 9
1994 10
1995 8
1996 12
1997 11
1998 13
1999 7
2000 13
2001 10
2002 9
2003 8
2004 13
2005 15
2006 11
2007 10
2008 11
2009 9
2010 10
2011 8
2012 11
2013 10
2014 6, header = TRUE)
Therefore, my Index of Dispersio phi will be equal to:
phi = var(df$Count) / mean(df$Count)
> print(phi)
[1] 0.4137045
So my data show underdispersion because 0 < phi < 1
How to test the significance of phi?
I couldn't find any specific test therefore I tried with a simulation of 10,000 random vectors created from a uniform distribution (i.e. each observation with same probability to occur).
Here my 'test' code:
#create list
list = lapply(1:10000, function(x) x = data.frame(round(runif(409, 1975, 2014)))) #409 is the total number of observations for each vector and is equal to sum(df$Count)
#count how many observations per year
list_tbl = lapply(list, function(y) y = data.frame(table(y$round.runif.409..1975..2014..)))
#calculate the index of dispersion for each vector
list_phi = lapply(list_tbl, function(z) z = var(z$Freq) / mean(z$Freq))
#unlist in order to have all the indexes in one df
sim_phi = unlist(list_phi)
#hist of indexes
hist(sim_phi)
#print mean, standard deviation and variance
> mean(sim_phi)
[1] 1.129969
> sd(sim_phi)
[1] 0.2422462
> var(sim_phi)
[1] 0.05868323
Can I affirm that my phi = 0.4137045
is not significant because the simulation pointed out mean(sim_phi) = 1.129969
showing then overdispersion?