3

I am building a Poisson regression (Model name fit_4) with a composite count (score) data as the dependent variable and other several variables as an independent. The mean (6.780) and variance (6.686) of the count variable is almost the same. Also, mean and variance are conditionally equal across all disaggregation. But, whenever I am trying to figure out the p-value for Goodness of Fit for the said regression, I am getting the value exactly 1. R code used for p-vale for Good-of-Fit is pchisq(fit_4$deviance, df=fit_4$df.residual, lower.tail=FALSE) Is it really possible to get a p-value exactly 1? Any lead to examine Goodness-of-Fit for this Poisson Model would be a great help. The summary result of poisson regression is presented below:

Call:
glm(formula = ci_score ~ r1_gender + r2_merginalised + r8_LogUMPCE + 
    r6_tenure + r8_LogUMPCE * r6_tenure + r4_city_size + r5_settlement, 
    family = poisson(), data = hh18_u_r)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.9098  -0.5988   0.1202   0.6221   2.6329  

Coefficients:
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     -0.743868   0.054038 -13.766  < 2e-16 ***
r1_genderMale                    0.090131   0.005507  16.368  < 2e-16 ***
r2_merginalisedOthers            0.051838   0.003915  13.240  < 2e-16 ***
r8_LogUMPCE                      0.265806   0.006497  40.910  < 2e-16 ***
r6_tenureOwned                   0.257248   0.062617   4.108 3.99e-05 ***
r4_city_sizeMillion Plus cities -0.006902   0.003790  -1.821   0.0686 .  
r5_settlementOthers              0.248393   0.008909  27.880  < 2e-16 ***
r8_LogUMPCE:r6_tenureOwned      -0.001872   0.007586  -0.247   0.8051    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 51231  on 43096  degrees of freedom
Residual deviance: 40518  on 43089  degrees of freedom
AIC: 198160

Number of Fisher Scoring iterations: 4

Also, is there any easy way to export publication-quality results of Poisson Regression from R to MS Word? Kindly help me.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 3
    You have some underdispersion and given the very large sample size, getting a $p$-value close to 1 when testing for overdispersion is then not surprising at all. The $p$-value is only numerically equal to one and this is because of finite numerical floating-point precision. If you instead compute the lower tail probability you'll see that this is `2.01e-19` so the p-value of `1 - 2.01e-19`. – Jarle Tufto Oct 09 '20 at 08:11
  • @JarleTufto, Thank you very much for this clarification. – Biswajit Kar Oct 09 '20 at 08:50
  • 2
    How do you compute this composite count? Why would it be Poisson distributed? (For instance, are all counts independent? Or could it be for instance binomial distributed?) – Sextus Empiricus Oct 09 '20 at 16:15
  • 1
    Also, already the variancemean. I am actually puzzled why you have for the null model a deviance which is so large relative to the null model. – Sextus Empiricus Oct 09 '20 at 16:25
  • @SextusEmpiricus, the dependent count is a total number of housing amenities available at a house out of 10 selected amenities. Distribution is negatively skewed (-0.639). Since the availability of amenities at a house does not influence the availability in another house directly, I can say counts are independent. Though, there is a chance of locational dependency. Example: pipe water service by Govt in locality could ensure availability of water connections for all households. For being count data and the mean and variance are almost equal, I selected poison regression. – Biswajit Kar Oct 10 '20 at 17:33
  • @SextusEmpiricus, How could it be binomial distribution? The composite count is the total count of success as per compliance to 10 selected parameters. It ultimately became the count data, similar to a number of awards. Could you please enlighten me about the fitness of this Poisson Model? – Biswajit Kar Oct 10 '20 at 17:38
  • 1
    With the question about independence, I did not mean independence between different houses but whether you have [independent increments](https://en.wikipedia.org/wiki/Independent_increments) in your counting process. If your counts are limited till 10 then you do not have a [Poisson distributed](https://en.wikipedia.org/wiki/Poisson_distribution) variable. – Sextus Empiricus Oct 10 '20 at 18:19
  • 1
    The mean and variance are not almost equal if you consider the large number >40k of measurements. For a truly Poisson distriuted variable you would have values that are even closer. `set.seed(1);r 6.78/6.68)` gives your ratio or larger in 1.3% of the cases. You could say that this is still fine because it is not an extreme (one sided) p-value, but this is for a Poisson distributed variable with *constant* parameter. In your case you consider a marginal distribution. – Sextus Empiricus Oct 10 '20 at 18:21
  • 1
    It is not about the distribution of $y$, whether it has equal mean and variance, it is about the conditional distribution. Related questions (about normality instead of Poisson) are here: https://stats.stackexchange.com/questions/12262 https://stats.stackexchange.com/questions/342759/ – Sextus Empiricus Oct 10 '20 at 18:26
  • @SextusEmpiricus, thank you. My counts are limited to 10. So, I understand that I must avoid using Poisson regression. In that case, can I use ordered logit by categorizing counts into three or two parts? Or just building a composite index using any variant of PCA instead of this type of scoring and apply OLS? Which model would be a good fit? Kindly suggests. – Biswajit Kar Oct 10 '20 at 18:51
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/113962/discussion-between-biswajit-kar-and-sextus-empiricus). – Biswajit Kar Oct 10 '20 at 18:56

0 Answers0