2

I am trying to interpret the results of some experiments I have run, involving showing participants videos of animated synthesised speech.

I ran a mean opinion score test. I made 36 movies in total. 9 each with the following exaggeration factors: 1(A), 1.1(B), 1.2(C), 1.3(D).

11 participants were shown the movies in a randomised order, and asked to score each movie between 1 and 5, based on their opinion of the quality of animation.

The mean opinion scores were: A - 2.41, B - 3.06, C - 3.3, D - 2.99, making C (1.2) the favourite.

A Kruskal-Wallis test (null hypothesis - the samples come from the same distribution) puts the $p$-value at 1.8595e-08, making it extremely unlikely that they come from the same distribution. Therefore we reject the null hypothesis and the test is significant at 0.01 level.

So then I ran Dunn's test to try and ascertain which of the treatments is significantly preferred. Using the Matlab function Dunn, I get the following results, but I don't know what they are telling me.

STEPDOWN DUNN TEST FOR NON PARAMETRIC MULTIPLE COMPARISONS

Group     N            Sum of ranks         Mean rank
 1        99             13893.00              140.33
 2        99             20879.50              210.90
 3        99             23569.50              238.08
 4        99             20264.00              204.69

Ties factor: 6882

Test        Q-value        Critical Q         Comment
3 vs 1      6.0084         2.6310             Reject Ho
3 vs 4      2.0525         2.6310             Fail to reject Ho
3 vs 2      No comparison made                Accept Ho
2 vs 1      4.3381         2.6310             Reject Ho
2 vs 4      0.3822         2.6310             Fail to reject Ho
4 vs 1      3.9559         2.6310             Reject Ho

Resuming...
     0     1     1     1
     0     0     0     0
     0     0     0     0
     0     0     0     0

Any help greatly appreciated.

Friedman's ANOVA Table
    Source  SS  df  MS  Chi-sq  Prob>Chi-sq
    Columns 27.2841 1   27.2841 55.5501 9.1106e-14
    Error   167.2159    395 0.42333     
    Total   194.5   791         
    Test for column effects after row effects are removed

1 Answers1

4

Three points:

  1. The null hypothesis of the Kruskal-Wallis is not what you have written, but rather stochastic equality, H$_{0}\text{: P}\left(X_{i} > X_{j}\right) = 0.5$ for all $i,j \in \{1,\dots,k\}$ for $k$ groups (assuming the CDFs of any two groups do not cross), so you are testing for stochastic dominance. When more stringent assumptions that each treatment has identically shaped distributions, and differences are entirely in location-shift, then you can interpret the null hypothesis as equality of medians, and the test as a test for median difference; and

  2. Your data do not sound appropriate to the Kruskal-Wallis test, because you have a blocked study design where the same individuals are measured repeatedly. Thus you are looking for a repeated measures test for 'treatment', given your scoring variable, repeated measures ANOVA is perhaps not a good candidate. However, the nonparametric Friedman test may well suit your needs; plus

  3. Tests like Kruskal-Wallis and Friedman assume that the data (your scores) are continuously measured. There are often 'corrections for ties' in nonparametric tests, but you should make sure that your statistical software uses such, and bear in mind that lots of ties (as might happen when there are only five possible scores) may distort your results.

Alexis
  • 26,219
  • 5
  • 78
  • 131
  • thanks for you comments. I have tried the matlab friedman function and go the following information. – shaw2thefloor Aug 07 '14 at 15:19
  • p value = 9.1106e-14 – shaw2thefloor Aug 07 '14 at 15:20
  • Friedman's ANOVA Table Source SS df MS Chi-sq Prob>Chi-sq Columns 27.2841 1 27.2841 55.5501 9.1106e-14 Error 167.2159 395 0.42333 Total 194.5 791 – shaw2thefloor Aug 07 '14 at 15:22
  • The comment forum won't seem to allow me to format this table, so I'll post it in my question at the top! – shaw2thefloor Aug 07 '14 at 15:23
  • So this is telling me there is a difference, but again doesn't tell me which is preferred? – shaw2thefloor Aug 07 '14 at 15:24
  • That's generally the way omnibus tests work, yes. You will need to find an appropriate *post hoc* test to identify which 'treatments' are different from which. – Alexis Aug 07 '14 at 15:31
  • Actually they are continuously measured, or as continuously measured as is reasonable to talk about on a computer. The test program gave the users a slider to set between 1-5, and the slider measured up to 3 decimal places. – shaw2thefloor Aug 07 '14 at 16:19
  • Do you have any suggestions for a post-hoc test? I really know very little about stats and don't really know where to start looking. – shaw2thefloor Aug 07 '14 at 16:20
  • The linked Wikipedia article gives references to folks who have developed *post hoc* tests. I'd start there. – Alexis Aug 07 '14 at 17:03
  • In the end I settled for ANOVA and Tukey's range test. – shaw2thefloor Aug 19 '14 at 12:42
  • @shaw2thefloor ANOVA does not correctly capture the contracted variance implied by a blocked design (unless you mean repeated measures ANOVA?). – Alexis Aug 19 '14 at 14:14