0

How do you analyze a randomized complete block design (6 plots within 4 blocks, data collected over three years) when assumptions of normality and heteroscedasticity are violated?

Is this correct? Use a non-parametric Kruskal-Wallis test when data violates assumptions of normality but not homoscedasticity. Use Welch’s analysis of variance (ANOVA) when data are heteroscedastic. And when both normality and heteroscedasticity are violated, data needs transformed prior to statistical analysis by ANOVA?

  • 1
    My first thought wield be to choose a more suitable parametric model (before collecting the data). What's the response measuring? My second thought might be to consider a bootstrapping approach to estimating quantities of interest from some suitable pivot all or pivot-like quantity (which again depends on what you're looking at) – Glen_b Aug 27 '21 at 03:48
  • My third thought would be to look at transformation, but the same considerations would apply as before (considering the nature of the variable and not choosing on the basis of the specific observations you want to use to fit your model) – Glen_b Aug 27 '21 at 04:17
  • If another distribution is theoretically a good fit (e.g., binomial, inverse gaussian, ...) an alternative might be to use GL(M)M. – KrisBae Aug 27 '21 at 08:00
  • @Glen_b we are measuring the response of fertilizer (5 treatments and 1 control) on crop yield (data collected once annually; three times total) and water quality (collected during storm events). – user333304 Aug 27 '21 at 13:54
  • Depending on the crop, yields might be either right-skew or left-skew or fairly symmetric. Weather events might even lead to zero-inflated mixtures. How did you assess normality, given that the distribution will be different within blocks and years? – Glen_b Aug 27 '21 at 17:49
  • @Glen_b shapio test and levene test – user333304 Aug 30 '21 at 15:42
  • What values did you supply to the Shapiro-Wilk test? Did your Levene test account for both blocks and years? – Glen_b Aug 30 '21 at 16:11
  • @Glen_b For example, I supplied the yields to the Shapiro-Wilks test in R: shapiro.test(Yield$Mg_ha). I also used leveneTest(Mg_ha ~ Trt, Yield) without including block and year, and then also separated the data by year and re-ran shapiro and levene separately for each year. – user333304 Aug 30 '21 at 17:53
  • Thanks. You cant use raw responses aggregated across one or both factors, since the assumption is within both. Neither is it practical to check within each factor combination. Instead you would look at residuals from the full model. However, *testing* these assumptions is not helpful on multiple grounds. See for example https://stats.stackexchange.com/q/2492/805 – Glen_b Aug 31 '21 at 00:00
  • E.g. Harvey Motulsky's answer, though there's a great deal more that could be said. Similar issues apply for checking heteroskedasticity. – Glen_b Aug 31 '21 at 00:06

0 Answers0