I am trying to interpret the results of some experiments I have run, involving showing participants videos of animated synthesised speech.
I ran a mean opinion score test. I made 36 movies in total. 9 each with the following exaggeration factors: 1(A), 1.1(B), 1.2(C), 1.3(D).
11 participants were shown the movies in a randomised order, and asked to score each movie between 1 and 5, based on their opinion of the quality of animation.
The mean opinion scores were: A - 2.41, B - 3.06, C - 3.3, D - 2.99, making C (1.2) the favourite.
A Kruskal-Wallis test (null hypothesis - the samples come from the same distribution) puts the $p$-value at 1.8595e-08, making it extremely unlikely that they come from the same distribution. Therefore we reject the null hypothesis and the test is significant at 0.01 level.
So then I ran Dunn's test to try and ascertain which of the treatments is significantly preferred. Using the Matlab function Dunn, I get the following results, but I don't know what they are telling me.
STEPDOWN DUNN TEST FOR NON PARAMETRIC MULTIPLE COMPARISONS
Group N Sum of ranks Mean rank
1 99 13893.00 140.33
2 99 20879.50 210.90
3 99 23569.50 238.08
4 99 20264.00 204.69
Ties factor: 6882
Test Q-value Critical Q Comment
3 vs 1 6.0084 2.6310 Reject Ho
3 vs 4 2.0525 2.6310 Fail to reject Ho
3 vs 2 No comparison made Accept Ho
2 vs 1 4.3381 2.6310 Reject Ho
2 vs 4 0.3822 2.6310 Fail to reject Ho
4 vs 1 3.9559 2.6310 Reject Ho
Resuming...
0 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0
Any help greatly appreciated.
Friedman's ANOVA Table
Source SS df MS Chi-sq Prob>Chi-sq
Columns 27.2841 1 27.2841 55.5501 9.1106e-14
Error 167.2159 395 0.42333
Total 194.5 791
Test for column effects after row effects are removed