Difference of means -- Testing two algorithms

Question

I have to perform 2 tests of the difference of means for the results obtained by 2 learning algorithms.

TEST 1

I have 2 algorithms
I take a data set (with N examples)
I execute both algorithms 10 times
Each time, each algorithm is trained on a set of 9/10*N examples and tested against a set of N/10 examples (i.e. a 10-fold cross validation is done), and the result is a XX% value of correct classified examples in such a test set.

Here I use the t-test. I know that the t-test makes assumptions of normality. I use it because the data set is quite big ( N>100 ), so I can assume a normal distribution of data (and if data is normally distributed, the means of a single algorithm are normally distributed, and so the differences of the means of the two algorithsm are normally distributed). Or should I consider only N/10 as the sample size from which the means are drawn?

TEST 2

I have two algorithms
I take 10 instances of a combinatorial optimization problem (a Travelling Salesman Problem)
I run each algorithm on the 10 instances
For each run, a write down the result value (that is at some distance from the optimal value for that instance)

How can I perform a test on the differences of means? Which means can I consider? Which data should I check for normality?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

For TEST 1, I gather the response variable is either correct or incorrect, that is, these are Bernoulli trials. For the Bernoulli / Binomial distribution, the standard deviation is a function of the mean, thus you aren't estimating the standard deviation from your data in the same sense as a t-test. So, in this case, you can use a z-test for the difference of two proportions (see, the example in this table). I don't know what software you're using, but if you were to test this in R, e.g., you would use ?prop.test.

For TEST 2, I think it's highly unlikely that the population underlying your data would be normally distributed, and with only 10 data, you wouldn't have enough for the central limit theorem to cover for you. (Note that my reasoning here is a-priori; I think it usually doesn't make much sense to test for normality, and especially in this case with so few data, see: Is normality testing essentially useless?) Instead, I would recommend you use the Mann-Whitney U-test, which is a non-parametric analog of the t-test. It would be appropriate if your data are not normal and your sample size is too small; it also can be more powerful than the t-test under these conditions. In R, you would use ?wilcox.test.

Difference of means -- Testing two algorithms

1 Answers1