Context
I'm comparing 7 classification algorithms with corrected resampled t-tests in 3 times repeated 10 fold CV. I have educated guesses as to how their performance could line up. (For example, a transductive semi-supervised algorithm will probably perform better than an inductive supervised one which can use only the small labeled part of the same training data etc.)
Problem
Before looking at any classification performance data, I have created a lineup of expected performance and written down my reasons for believing so. I could announce only 6 hypotheses:
H1 algorithm A performs better than algorithm B
H2 algorithm B performs better than algorithm C
...
H6 algorithm F performs better than algorithm G
I could also perform 7(7-1)/2=21 pairwise comparisons irrespective of my intuitions. Since I need to use a Bonferroni correction (or the more powerful Holm approach), it would be advantageous to keep the number of hypotheses to a minimum.
Questions
Is it legitimate to reduce the number of pairwise comparison hypotheses based on expectations?
Do I see this correctly that H1 and H2 combined imply that algorithm A performs better than C without requiring a dedicated hypothesis to test this? If both H1 and H2 have their null hypotheses rejected, then obviously yes, but what if this is not the case?
Also, I would take two-sided tests in spite of my intuitions. Like this, nobody can tell me that I might have adjusted the direction of the one sided test after seeing the data. Doesn't this conservative approach contradict the other decision to reduce the number of hypotheses?