Actual difference between the statistic results from scipy.stats.ranksums and scipy.stats.mannwhitneyu

Question

So, I have been trying to test if two independent samples come from the distribution, i.e. if they are greater or less than one another. Eventually I found out the Mann Whitney U Test is the appropriate test for me.

I came across scipy similar functions called scipy.stats.ranksums and scipy.stats.mannwhitneyu, which reference the same base theory for comparing independent samples. However, I have looked over the entire internet and could not find why the heck these functions provide distinct results, and unfortunately I was not smart enough to reverse engineer one to the other.

I would be very pleased if someone could enlighten me with any answer.

Example code:

from scipy.stats import ranksums,mannwhitneyu
rng = np.random.default_rng(seed=42)
sample1 = rng.normal(0, 1, 100)
sample2 = rng.normal(1, 1, 100)

print(ranksums(sample1, sample2,alternative='less'))
print(mannwhitneyu(sample1,sample2,alternative='less',use_continuity=False))
-----------------------------------------------------------------------------
RanksumsResult(statistic=-7.122478605972594, pvalue=5.300160604890462e-13)
MannwhitneyuResult(statistic=2085.0, pvalue=5.300160604890462e-13)

code for ranksums: https://github.com/scipy/scipy/blob/v1.7.1/scipy/stats/stats.py#L7713-L7787

code for mannwhitneyu: https://github.com/scipy/scipy/blob/v1.7.1/scipy/stats/_mannwhitneyu.py#L181-L424

EDIT: I am interested in using the statistic result of the Mann Whitney test as a measure of an AUC-ROC for a machine learning project I have been working on. Only the .mannwhitney() function gives me the desired results and looks like .ranksums() outputs a sort of z-score (like pointed in the results). Still, it would be nice to know why is that.

What are the "distinct results" to which you refer?? Both tests report identical p-values, to more significant figures than I can count! (That is because their statistics are determinate functions of one another.) — whuber, Dec 31 '21 at 02:09
The statistics are different but equivalent (they should always give the same p-value when you test the same thing as they do here). A number of posts on site discuss various versions of the Wilcoxon rank sum statistic (in this case it looks like you're getting a z-score, but check the help on the function), and their relationships to the Mann-Whitney statistic. — Glen_b, Dec 31 '21 at 04:19
Right Glen. In my case I am also interested in the statistic result. It is a Machine Learning project I am developing and I was assessing the relationship between this test and the Area Under the ROC Curve. Turns out that only the .mannwhitneyu() gives me the desired results. Most likely is due to the fact that the other function outputs a z-score as statistic. — Lucas Thimoteo, Dec 31 '21 at 12:14
You can convert that back to an actual rank sum by multiplying back by the standard deviation of the sum of the ranks and adding back the mean (under the null in each case - which mean an standard deviation you use depends on the form of the statistic you want); the relationship to the Mann-Whitney is straightforward. There are questions on site that discuss the different forms of statistic and their relationships (and last I saw the relevant Wikipedia page, it had details). — Glen_b, Jan 01 '22 at 23:17

Actual difference between the statistic results from scipy.stats.ranksums and scipy.stats.mannwhitneyu

0 Answers0