1

I have a dataset of counts of four different metadata factors associated with a gene and two experimental groups, FGT and free, with 52 and 40 unique genes respectively. The first 100 rows can be found here: https://pastebin.com/PAG5pCDh (I can provide more)

Having performed a poisson distributed glm on count data and identifying the variable origin as a significant predictor, as originfree is significant (I think I am under standing that correctly?), how do I determine if origin free is associated with a higher or lower count.

A truncated output of coefficients for the glm looks like this:

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -7.100e-01  5.827e-01  -1.218 0.223062    
originfree           -2.921e-01  8.830e-02  -3.308 0.000939 ***
variableDuplication   1.427e-01  1.116e-01   1.279 0.201013    
variableKnown_target -1.609e+00  2.000e-01  -8.047 8.47e-16 ***
variablePhylogeny     1.310e-01  1.119e-01   1.171 0.241491    
geneGrpE              1.792e+00  6.236e-01   2.873 0.004063 ** 
genePGK              -4.455e-15  8.165e-01   0.000 1.000000    
geneRibosomal_S14     6.931e-01  7.071e-01   0.980 0.326959    
geneSHMT              2.079e+00  6.124e-01   3.396 0.000684 ***
geneTIGR00009         9.758e-15  8.165e-01   0.000 1.000000    
geneTIGR00057         6.931e-01  7.071e-01   0.980 0.326959    
geneTIGR00069        -6.149e-15  8.165e-01   0.000 1.000000    
geneTIGR00079         1.386e+00  6.455e-01   2.148 0.031743 *  
geneTIGR00105         1.386e+00  6.455e-01   2.148 0.031743 * 

I see that originfree is significant, which I understand to mean it the fact of something being originfree or not significantly affects the models ability to predict count )please correct me if I am wrong)

Now how do I find out if originfree is associated with an increase or decrease in the count of the four metadata factors? Would I have to run separate glms on subset dataframe for each metadata factor in order to work this out?

My alternative hypothesis is that it would lead to a decrease

RMM
  • 83
  • 6
  • 1
    Read the value of the estimated coefficient. – whuber Jul 09 '20 at 13:50
  • I believe your question is answered [here](https://stats.stackexchange.com/questions/11096/how-to-interpret-coefficients-in-a-poisson-regression). – Stephen G Jul 09 '20 at 13:56
  • Thank you @StephenG That is very helpful! – RMM Jul 09 '20 at 14:11
  • @whuber So would that for example mean ´originfree` results in a -2.921 decrease with even 1 unit increase of the intercept? And then how do i identify what the intercept is made of? Is it one of each `origin`, `variable` and `gene` ? – RMM Jul 29 '20 at 14:56
  • I cannot connect that comment with the question, but permit me to remark that (1) the coefficient of `originfree` is $-0.292$ and (2) because it's negative, higher values of `originfree` are associated with lower values of the response variable. – whuber Jul 29 '20 at 15:10
  • @whuber I now understand how they work. I am still a bit confused about the p value. Is it telling me the estimate is significantly different to the intercept estimate? – RMM Jul 30 '20 at 08:06
  • The p-value is for the test of the null hypothesis that the true value of this coefficient is zero, compared to the alternative that the true value is nonzero (either positive or negative). Your p-value is small enough that many people would take this as evidence that the `originfree` coefficient is nonzero (and therefore negative, because its estimate is negative). – whuber Jul 30 '20 at 13:34

1 Answers1

1

You exponentiate the estimation of the coefficient and this gives you a multiplicative factor by which you can see the effect of the coefficient.

In R this can be done by:

exp(estimation)

As highlighted by @Whuber and directed to an answer by @Stephen G and also a very good answer can be found here

RMM
  • 83
  • 6