9

My dependent variable is continuous, non-normal (skewed left according to Shapiro-Wilk test). I have two independent variables (treatment group by colour, food type). There are 3 levels within each independent variable. The number of observations for each independent variables are not equal.

I have looked up non-parametric tests such as Friedman's Test and Scheirer-Ray-Hare Test, neither of which seem suitable (due to unequal number of observations).

Are there alternative tests that anyone could suggest? I am using SAS.

chl
  • 50,972
  • 18
  • 205
  • 364
mbee
  • 101
  • 2
  • 5

2 Answers2

8

What question are you trying to answer?

If you want an overall test of anything going on, The null is that both main effects and the interaction are all 0, then you can replace all the data points with their ranks and just do a regular ANOVA to compare against an intercept/grand mean only model. This is basically how many of the non-parametric tests work, using the ranks transforms the data to a uniform distribution (under the null) and you get a good approximation by treating it as normal (the Central Limit Theorem applies for the uniform for sample sizes above about 5 or 6).

For other questions you could use permutation tests. If you want to test one of the main effects and the interaction together (but allow the other main effect to be non-zero) then you can permute the predictor being tested. I you want to test the interaction while allowing both main effects to be non-zero then you can fit the reduced model of main-effects only and compute the fitted values and residuals, then randomly permute the residuals and add the permuted residuals back to the fitted values and fit the full anova model including the interaction. Repeat this a bunch of times to get the null distribution for the size of the interaction effect to compare with the size of the interaction effect from the original data.

There may be existing SAS code for doing things like this, I have seen some basic tutorials on using SAS for bootstrap and permutation tests (the quickest way seems to be using the data step to create all the datasets in one big table, then using by processing to do the analyses). Personally I use R for this type of thing so can't be of more help in using SAS.


Edit

Here is an example using R code:

> fit1 <- aov(breaks ~ wool*tension, data=warpbreaks)
> summary(fit1)
             Df Sum Sq Mean Sq F value   Pr(>F)    
wool          1    451   450.7   3.765 0.058213 .  
tension       2   2034  1017.1   8.498 0.000693 ***
wool:tension  2   1003   501.4   4.189 0.021044 *  
Residuals    48   5745   119.7                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> fit2 <- aov(breaks ~ wool + tension, data=warpbreaks)
> 
> tmpfun <- function() {
+   new.df <- data.frame(breaks = fitted(fit2) + sample(resid(fit2)),
+                        wool = warpbreaks$wool,
+                        tension = warpbreaks$tension)
+   fitnew <- aov(breaks ~ wool*tension, data=new.df)
+   fitnew2 <- update(fitnew, .~ wool + tension)
+   c(coef(fitnew), F=anova(fitnew2,fitnew)[2,5])
+ }
> 
> out <- replicate(10000, tmpfun())
> 
> # based on only the interaction coefficients
> mean(out[5,] >= coef(fit1)[5])
[1] 0.002
> mean(out[6,] >= coef(fit1)[6])
[1] 0.0796
> 
> # based on F statistic from full-reduced model
> mean(out[7,] >= anova(fit2,fit1)[2,5])
[1] 0.022
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Greg Snow
  • 46,563
  • 2
  • 90
  • 159
  • 1
    +1. Tukey's [median polish](http://stat.ethz.ch/R-manual/R-patched/library/stats/html/medpolish.html) works well for fitting the model when that's all that's needed. It would be interesting (but perhaps a little computationally intensive) to couple that with a permutation test. – whuber Oct 25 '12 at 21:07
  • Thanks for the responses. This is more of an exploratory study - just to examine whether there are differences in the dependent variable based on both independent variables. – mbee Oct 25 '12 at 22:39
  • @GregSnow, Which package do you use in R for this? I am using `ezPerm` function of `ez` because it allows me to do `dv ~ iv1 * iv2 | subj`. It gives me p-value but the author advises against its interpretation (I know you are explaining how to manually perform this but, although I have a strong programming formation, my knowledge of stats terms is just starting to improve) – toto_tico Nov 13 '15 at 23:28
  • 1
    @toto_tico, I don't use any add-on packages. Permutation tests are easy enough to do with just regular R code. I have included an example above using a built in dataset. – Greg Snow Nov 16 '15 at 17:30
  • @GregSnow, why do you sample on the residuals sample(resid(fit2))? I thought that the sampling was performed in the dv (warpbreaks$breakl). My intuition tells me that what I read on permutation tests was for one factor, and things get more complicated for multiple factors. – toto_tico Nov 16 '15 at 20:04
  • @toto_tico, the `sample` function in the way I have used it here does the permutation (taking a sample of the same size without replacement is just the permutation). Since this is testing an interaction beyond the effect of the main effects, there is not an original variable that can be permuted, so we permute the residuals and add them back to the fitted values. – Greg Snow Nov 16 '15 at 20:55
  • @GregSnow, the problem is that permuting the residuals and add them to the fitted values are not the same that just permuting the original values. I feel like doing `breaks = sample( warpbreaks$breaks)` makes more sense than `breaks = fitted(fit2) + sample(resid(fit2))` because the fitted values bias the following permutation tests toward the original results (in other words, we would be just varying the variance). The formula that I suggest outputs significant differences in the same cases as yours do, however the p-values are different. – toto_tico Nov 16 '15 at 22:28
  • @toto_tico, your method tests a different null hypothesis (which is fine if that is the null hypothesis that you are interested in), but you method would show significance for a case where the interaction is truly 0, but at least one of the main effects is not 0. If the `fit2` model fits the data, then the theoretical residuals are exchangeable, the observed residuals approximate the theoretical residuals, so we permute them. – Greg Snow Nov 16 '15 at 22:43
  • @GregSnow, does this mean that if I want to test main effects and interactions, I should be actually testing them separatedly in different permutation tests? The first based on `wool+tension`, and the second based on `wool*tension` (sampling on residuals) )? All of this bring me to another question: what does anova(fitnew2,fitnew) do? (i.e. How do I interpret it?) I have the feeling that we are meassuring a difference between the two models, i.e. if adding the interaction (ultimately another component of the equation) actually changes/improve the model. – toto_tico Nov 16 '15 at 23:09
  • @toto_tico, this is really getting to long for comments. What did `?anova` tell you? For permutation strategies there are multiple papers that go into much more detail, one is here: http://avesbiodiv.mncn.csic.es/estadistica/permut1.pdf you can find others using google scholar or other searches. – Greg Snow Nov 17 '15 at 17:25
3

+1 to @Greg Snow. In keeping with his non-parametric use ranks strategy, you could use ordinal logistic regression. This will allow you to fit a model with multiple factors and interactions between them. The relevant SAS documentation is here. There is a tutorial on how to do this in SAS at UCLA's excellent stats help website here, and another tutorial from Indiana University here.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Isn't this approach answering a different question, though? In the ANOVA setting, the continuous variable is the outcome, and we are testing the sums of squares within/between factors. In an ordinal logistic regression, you are trying to predict factor membership according to the values of that continuous variable. While there are certainly situations where these will give you similar inference, it isn't clear to me that these are equivalent procedures in GENERAL. In the same way that the regression of Y=BX is not always going to be the same as X=BY. – Ryan Simmons Oct 29 '15 at 15:37
  • Essentially, I think the questions "Are these groups different according to X?" and "How can X be used to classify individuals among these groups?" will only be the same in very specific circumstances, and that you can't necessarily substitute ANOVA for logistic regression and get comparable results. – Ryan Simmons Oct 29 '15 at 15:40
  • @RyanSimmons, you needn't necessarily classify anything. The test of the OLR model is a test if some groups are associated with typically larger Y values than other groups. You are simply using only the ordinal component of the Y values. Tests like Kruskal-Wallis, Mann-Whitney, etc. are special cases of OLR. – gung - Reinstate Monica Oct 29 '15 at 15:58
  • @gung, what about the within design part of the problem? Is it possible with an ordinal logistic regression? – toto_tico Nov 13 '15 at 23:18
  • 1
    @toto_tico, I just answered you on the other thread. You need to use `clmm()` in the ordinal package. – gung - Reinstate Monica Nov 13 '15 at 23:21