1

Here is my model and output:

plants_lm <- lm(weight ~ group, data = plants)

summary(plants_lm)

Call:
lm(formula = weight ~ group, data = plants)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0710 -0.4180 -0.0060  0.2627  1.3690 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        5.0320     0.1971  25.527   <2e-16 ***
groupFertlizer_A  -0.3710     0.2788  -1.331   0.1944    
groupFertlizer_B   0.4940     0.2788   1.772   0.0877 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6234 on 27 degrees of freedom
Multiple R-squared:  0.2641,    Adjusted R-squared:  0.2096 
F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591

I don't understand how the predictors (levels of "group") are both insignificant, yet the model is somehow significant. I found this post: Why is it possible to get significant F statistic (p<.001) but non-significant regressor t-tests?

But none of this seems to apply here. The answers on that post say it can happen if the predictors are correlated (in my case they can't be, they are separate treatments) or if two or more predictors are close to significant. That doesn't seem to be the case here as Fertilizer A is not close. Or is it? What is "close"? Any insight appreciated.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Elasso
  • 69
  • 4
  • Remember what the tests on the individual coefficients are testing as opposed to the overall test. If that does not help try changing the reference category and refit the model. – mdewey Oct 05 '21 at 14:17
  • But my reference is the control. Surely I want that to be the reference? – Elasso Oct 05 '21 at 14:18
  • And can you elaborate on your statement? Sorry, I am a stats beginner and trying to learn R at the same time and it's a bit overwhelming. What is it that the individual coefficients are testing compared to the overall test? If I am trying to determine if Fertilizer A or B have an impact on plant growth rates, which is more important? – Elasso Oct 05 '21 at 14:19
  • This answer (of mine) shows that it is possible that neither t-test is significant but the joint test is when regressors are uncorrelated (have you checked they really are, in terms of sample correlation?): https://stats.stackexchange.com/questions/151403/significance-of-individual-coefficients-vs-significance-of-both/151410#151410 – Christoph Hanck Oct 05 '21 at 14:20
  • What is the reference category? – mdewey Oct 05 '21 at 14:20
  • I'm not exactly sure how to answer that but the reference is the control i.e. no treatment. – Elasso Oct 05 '21 at 14:24
  • If all you want to know is whether A is different from control and B is different from control then you do not need the overall test. – mdewey Oct 05 '21 at 14:25
  • @ChristophHanck but what does this actually mean and how would you describe the results? I haven't actually checked if they are uncorrelated, I just don't see how they could possibly be if they are two separate treatments. Is there a preferred way of testing for correlation? – Elasso Oct 05 '21 at 14:25
  • @mdewey so in that case should I be using a different test? The instructions were to test the hypothesis that fertilizer changes the yield and whether fertilizer types differed. – Elasso Oct 05 '21 at 14:27
  • That is a little hard to answer without seeing your data structure, but basically I just suggested running `cor` on your two predictors. – Christoph Hanck Oct 05 '21 at 14:44
  • I still think you would have learned something if you changed the reference category, say to A, and re-ran the model. – mdewey Oct 05 '21 at 15:03
  • 1
    Similar questions: https://stats.stackexchange.com/questions/155459/logistic-regression-the-categorical-variable-is-significant-but-each-of-its-sub, https://stats.stackexchange.com/questions/94441/how-to-interpret-insignificant-categorical-variables-for-logistic-regression (see answer by @gung:), https://stats.stackexchange.com/questions/549815/if-i-have-one-non-significant-factor-level-in-a-glm-is-that-entire-variable-now and many more ... – kjetil b halvorsen Nov 11 '21 at 11:57
  • 1
    To the original question, lots of small "insignificant" effects can add up to a bigger effect. – Frank Harrell Nov 11 '21 at 12:49
  • In this particular situation there is a simple resolution: the reference category has the *middle* value of the estimated means. Relative to it, neither of the other two categories appears much different. However, the other two categories differ enough between themselves to make the results significant. If you like, this is a statistical response to one of Zeno's paradoxes: the accumulation of small differences indistinguishable from zero can result in a larger difference that *is* clearly nonzero. – whuber Nov 11 '21 at 14:45

0 Answers0