0

I'm moving from JMP to R for my statistical analysis and have noticed that the two-way ANOVA test function aov() does not report model effect. This is standard in JMP and to me it's essential for hypothesis testing, as one can't assume a main effect if there isn't a Model effect.

  1. My first question, is there a reason why model effects aren't evaluated in the hypothesis testing within the R community?
  2. My second question is, if Model effects aren't built into the aov() function then how can I use R to access the F table to obtain the model effect and prob > F?

##Calculate model effects with a F table lookup.

Data.X <- data.frame(mtcars)
fit.ANOVA <- aov(mpg ~ cyl * gear, data = Data.X)
tests <- as.data.frame(summary(fit.ANOVA)[[1]])
Model.Effect.AOV.DF.M <- sum(tests$Df[1:(nrow(tests) - 1)])
Model.Effect.AOV.DF.error <- tests$Df[nrow(tests)]
Model.Effect.AOV.SS <- sum(tests$`Sum Sq`[1:(nrow(tests) - 1)])
Model.Effect.AOV.MS <- Model.Effect.AOV.SS / Model.Effect.AOV.DF
Model.Effect.AOV.FRatio <- Model.Effect.AOV.MS / tests$`Mean Sq`[nrow(tests)]
Model.Effect.AOV.Fcrit.0.05 <- qf(1 -0.05, df1 = Model.Effect.AOV.DF.error, df2=Model.Effect.AOV.DF.M)
#You should not reject the null if your F critical value is smaller than your F Ratio (F Value).
if(Model.Effect.AOV.Fcrit.0.05 <= Model.Effect.AOV.FRatio){
  print('Data Contains Significantly Different Populations')
} else {
  print('Data is the same Population')
}
JamesC
  • 15
  • 7
  • Note that `cyl` and `gear` are treated as numeric variables in the `mtcars` data set, so the example isn't a traditional two-way anova. I think it makes sense to treat these variables as factor variables. This doesn't affect the import of the question. – Sal Mangiafico Sep 16 '19 at 21:26
  • You are completely correct Sal Mangiafico. This should be a Factorial ANOVA. I'll edit the example to reflect this. – JamesC Sep 16 '19 at 21:46
  • I wouldn't mess with the example. I think it perfectly illustrates your question. – Sal Mangiafico Sep 16 '19 at 21:48
  • Okay, thank you. – JamesC Sep 16 '19 at 21:50
  • If this question is off topic for this site, then where should it go? – JamesC Sep 17 '19 at 16:13
  • Because it's really about R programming, and not statistics per se, it really belongs on Stack Overflow. There is some way to migrate it there, but I'm not sure if a moderator needs to do that. In any case, don't re-post the question elsewhere. – Sal Mangiafico Sep 17 '19 at 16:17

1 Answers1

3

If you use the lm function for general linear models, the summary will give you p value for the model, which is what you are looking for. In general, I recommend using lm and then using Anova in the car package to produce the anova table. Note that the Anova function uses type II sums of squares by default, while aov and anova uses type I sums of squares. You could use anova(model) if you wanted type I SS. It appears that JMP uses type III sums of squares by default. The Anova function can also produce type III SS, but you need to specify the contrasts in the model, as explained on R Bloggers.

if(!require(car)){install.packages("car")}

model = lm(mpg ~ cyl * gear, data = mtcars)

summary(model)

   ### Residual standard error: 3.198 on 28 degrees of freedom
   ### Multiple R-squared:  0.7457, Adjusted R-squared:  0.7185 
   ### F-statistic: 27.38 on 3 and 28 DF,  p-value: 1.786e-08

f = summary(model)$fstatistic

f

   ###    value    numdf    dendf 
   ### 27.37524  3.00000 28.00000 

p <- pf(f[1],f[2],f[3],lower.tail=F)

p

   ###        value 
   ### 1.785687e-08 

library(car)

Anova(model)

   ### Anova Table (Type II tests)
   ###
   ###       Sum Sq Df F value   Pr(>F)    
   ### cyl       563.40  1 55.0992 4.39e-08 ***
   ### gear        5.43  1  0.5312   0.4722    
   ### cyl:gear   16.60  1  1.6234   0.2131    
   ### Residuals 286.30 28 
Sal Mangiafico
  • 7,128
  • 2
  • 10
  • 24
  • 1
    +1 particularly for differentiating the types of sums of squares, the statistical issue here. See [this page](https://stats.stackexchange.com/q/20452/28500) for a very useful introduction to that issue. – EdM Sep 16 '19 at 19:47
  • 1
    Thanks @EdM. I added this aspect to my answer. I believe the type I vs. type II distinction won't matter for the "model effect" that the OP is looking for. – Sal Mangiafico Sep 16 '19 at 19:48
  • 1
    The distinction might, however, matter for others who come upon this page later. – EdM Sep 16 '19 at 19:50
  • 1
    Yes, it definitely matters in the anova table, and when trying to replicate results from other packages! – Sal Mangiafico Sep 16 '19 at 19:53
  • Thank you for the tip Sal Mangiafico, and yes, the Type III information will help me in my future use with R programming. – JamesC Sep 16 '19 at 20:13
  • 2
    @JamesR you could also consider the `anova` function provided by Harrell's [`rms package`](https://CRAN.R-project.org/package=rms). It "tests most meaningful hypotheses in a design" via Wald tests that combine a predictor with its interactions to evaluate both its overall significance and the significance of its interactions, produce joint tests of all interactions in a model, etc. For that you have to use the `rms` versions of standard functions (e.g., `ols` versus `lm`) and there is a learning curve, but if you will be doing a lot of modeling it is well worth the effort. – EdM Sep 16 '19 at 20:57