3

I'm doing a research based on cross-ISP traffic of P2P applications. I have collected a significant amount of real world test data with the optimization applied and without it. However the dataset length is different (55000 vs 65000). two means are 5.0 and 8.55 two variances are 7.996 and 42.85 I thought of applying Welch two sample t-test in R but the P is significantly lower. (< 2.2e-16)

Any good way of telling that with the optimization it works well? (by comparing the means its works well because 5.0 hops is a good improvement compared to 8.55)

tha4
  • 131
  • 1
  • It's alwasy a good idea to graph the data, especially when the two variances are that different. I would start with a parallel box plot, and then maybe a quantile quantile plot to see what is going on. – Peter Flom Oct 20 '12 at 12:49
  • @PeterFlom Plotting the box plot proved to be a very good idea. In one dataset there are number of outliers compared to the other. As I think this is the reason behind the very low value of P – tha4 Oct 20 '12 at 16:06
  • @Procrastinator thanks for the helpful links. Wilcoxon test also turn out be the same P result – tha4 Oct 20 '12 at 16:08
  • 2
    If there are a few outliers, then no test of central tendency is likely to be what you want. Although it is true that tests of the median are not affected by outliers (unless there are a huge number) the real conclusion is not that "this median is higher than that" nor that "this mean is higher than that" but that "this group has outliers and that group has not". – Peter Flom Oct 20 '12 at 16:51
  • @PeterFlom Thumbs up for the nice explanation. – tha4 Oct 20 '12 at 17:22

0 Answers0