1

I would like to compare task time to completion distributions (measured in minutes) across two groups. The distributions are highly skewed (the means are 300 to 400 times larger than the medians) and multimodal. I have a few thousand observations per group.

In particular, I am interested in testing whether the test group is completing the task faster.

Examining the distributions visually and comparing their quantiles, the test group appears to show faster task completion times.

I've tried using the Whitney Mann U test and Kolmogorov Smirnov test (like below) testing for whether the test group time to completions are faster, conducting one-sided tests. The KS test gives a smaller p-value.

Which test is more appropriate in this use case? I've read that the KS test has more power when testing continuous variables. Is this correct? And would that explain why it's giving smaller p-values?

wilcox.test(test, control, alternative = "l")
ks.test(test, control, alternative = "g")
Harry M
  • 153
  • 7
  • 1
    Impossible to say just from the information provided. Multi-modality and skewness could have various important impacts on how the two tests perform. What do you mean by one group finishing faster? On average? Fastest are faster? Fewer take longer than specific benchmark? // How many in each group? Enough that the answer is just obvious looking at histograms? – BruceET Feb 07 '21 at 04:55
  • 1
    I have a lot of trouble believing that KS is remotely appropriate for your task. KS will be sensitive to much more than a shift in mean or median. It will, for instance, catch that $N(0,1)$ and $N(0,7)$ are different, but it will not give insight into the way that they are different; those distributions do not differ in the way that seems like it would be most important to you. // I am with Bruce that you should refine what you mean about one group being faster. – Dave Feb 07 '21 at 05:55
  • With your sample sizes, detailed descriptive statistics could be revealing. Try relative distribution methods, see https://stats.stackexchange.com/questions/28431/what-are-good-data-visualization-techniques-to-compare-distributions/274058#274058 Share some visualizations with us. Can you share (a link to) your data, or some mockup? – kjetil b halvorsen Feb 07 '21 at 15:14
  • 1
    Mann Whitney test provides an answer to the question “what is the probability that a (randomly selected) person in the test group completes the task faster than a person in the control group?” That sounds to me like the answer to your question. But, I agree with the other comments that you should decide more clearly what question you want to answer. – John L Feb 07 '21 at 18:31
  • Ideally, I would like to test whether the times to complete the task distribution shifted to become faster, across the entire distribution (not just the median or a given quantile). – Harry M Feb 07 '21 at 23:58
  • Based on the comments it seems like Mann Whitney will be safer because it will simply test for a randomly selected person being faster in the test group, whereas KS could also indicate a difference if there's simply a difference in the distributions that is not related to whether test group participants are faster? I assumed that specifying `alternative = "g"` in the KS test would let me just test for whether the test group is faster overall, but sounds like this may not be correct? – Harry M Feb 08 '21 at 00:02
  • The multimodality in my distributions comes from the fact that most people complete the task within a few to several hours (highest density at 1-2 hours), then a smaller subset returns to the task on day 2 (which would count as 24-48 hours time to complete the task) and then an even smaller portion returns in days 3-7 to complete the task. I'm measuring the time to complete as total minutes elapsed from starting to completing. Plotting the densities on a log scale, the distributions look like progressively smaller bumps – Harry M Feb 08 '21 at 00:06

0 Answers0