0

I am currently trying to analyse the effect of an ilness (0= no infection, 1= infection) on 9 different Genotypes in plants. My Dataframe consist of 2 colums Genotyp and Infection. I have 459 rows for all the observerd plants. Here a cutout. [1]: https://i.stack.imgur.com/NVZRA.png

Due to being new to R I used a Youtube Tutorial to help. I just dont know if the output I get is correct or if I should do something else. Because when I run the code Intercept is in the place where the first Genotyp ''aacc'' would be. I tried researching the term Intercept, but couldn't make anything out of it. I would really appreciate the help or any other info. Thanks in advance!

model1<- glm(Befall~ Genotyp, data= befall_12, family = binomial)
summary(model1)```


I get:


Call:
glm(formula = Befall ~ Genotyp, family = binomial, data = befall_12)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5829  -0.9236   0.8203   0.9587   1.7941  
Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.3863     0.3727  -3.720 0.000199 ***
GenotypaaCc   1.9871     0.4796   4.144 3.42e-05 ***
GenotypaaCC   2.0307     0.4599   4.415 1.01e-05 ***
GenotypAacc   1.9253     0.5020   3.835 0.000125 ***
GenotypAaCc   1.5466     0.4684   3.302 0.000960 ***
GenotypAaCC   0.6242     0.4936   1.264 0.206062    
GenotypAAcc   0.7550     0.4474   1.688 0.091491 .  
GenotypAACc   1.5404     0.4650   3.312 0.000925 ***
GenotypAACC   2.3026     0.4888   4.711 2.46e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
    Null deviance: 636.20  on 458  degrees of freedom
Residual deviance: 583.43  on 450  degrees of freedom
AIC: 601.43
Number of Fisher Scoring iterations: 4```


amj_ris
  • 1
  • 3
  • This is normal. It happens because to fit a regression model with a factor variable requires some kind of constraint. Someone will surely chime in with a longer answer to explain why, but if you want to work on solving this yourself, look for an explanation of contrasts in regression on factor variables, and also R's default set-to-zero contrast. – Wesley Sep 27 '21 at 17:31
  • I just read up on it a bit. So the Genotyp ''aacc'' is used as a contrast?! I'll research some more. Thank you for the quick awnser :) – amj_ris Sep 27 '21 at 18:04
  • 1
    Does this answer your question? [Intercept term in logistic regression](https://stats.stackexchange.com/questions/92903/intercept-term-in-logistic-regression) Or perhaps [this answer](https://stats.stackexchange.com/a/443964/28500)? Other coefficients represent _differences from_ the log-odds for the reference category of "aacc". – EdM Sep 27 '21 at 18:18
  • @amj_ris, yes, the `aacc` genotype is used as the *base case*, with all the other coefficients giving the difference from that base case. While @EdM is correct on this, the answers that he linked aren't about the same issue. – Wesley Sep 27 '21 at 18:35
  • Thank you @Wesley – amj_ris Sep 28 '21 at 12:12

0 Answers0