I have about 10 pairs of scores, $(x_1, y_1),~ ...,~(x_{10}, y_{10})$, with all $x_i$ and $y_i$ being between 0.0 and 1.0. I'm interested in testing whether the mean difference over pairs is statistically significant. Because they're all between 0.0 and 1.0, we can't assume that the differences are normally distributed, so the $t$-test is invalid.
Question 1: In my case, these numbers are all far from the 0, 1 boundaries (in multiples of the differences between the pairs). It seems that this should make the differences-are-normally-distributed assumption approximately valid, in which case we can use a $t$-test. Is this intuition valid? If so, are there any bounds that one can make use of to still make rigorous claims?
Question 2: Regardless, the suggested solution seems to be the Wilcoxon signed-rank test. I want to know if I can instead use a simple permutation test. In the non-pair version, we'd simply do a Monte Carlo simulation, exchanging samples between the two populations, forming an empirical distribution over the difference between means, and finally looking at where our difference falls to see if it's statistically significant. So my question is whether this algorithm is valid for the paired-difference case:
- Visit each pair and flip the order with 50% chance.
- Compute the mean of $x_i - y_i$; this is a sample for our empirical distribution.
- Repeat above steps enough times to get a fairly smooth empirical distribution.
- Use the empirical distribution to conclude whether our original observed mean of $x_i - y_i$ is statistically significant.
And, if the answer to either question is yes, can I please have a reference that discusses this rigorously?