46

I have Two samples that I want to test (using python) if they are drawn from the same distribution. To do that I use the statistical function ks_2samp from scipy.stats. It returns 2 values and I find difficulties how to interpret them. Help please!

meri
  • 461
  • 1
  • 4
  • 3

2 Answers2

33

As Stijn pointed out, the k-s test returns a D statistic and a p-value corresponding to the D statistic. The D statistic is the absolute max distance (supremum) between the CDFs of the two samples. The closer this number is to 0 the more likely it is that the two samples were drawn from the same distribution. Check out the Wikipedia page for the k-s test. It provides a good explanation: https://en.m.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

The p-value returned by the k-s test has the same interpretation as other p-values. You reject the null hypothesis that the two samples were drawn from the same distribution if the p-value is less than your significance level. You can find tables online for the conversion of the D statistic into a p-value if you are interested in the procedure.

  • Thank you for your answer. In fact, I know the meaning of the 2 values D and P-value but I can't see the relation between them. How can I define the significance level? Can you give me a link for the conversion of the D statistic into a p-value? – meri May 02 '13 at 13:22
  • Sure, table for converting D stat to p-value: http://www.soest.hawaii.edu/wessel/courses/gg313/Critical_KS.pdf – CrossValidatedTrading May 02 '13 at 16:19
  • @CrossValidatedTrading: Your link to the D-stat-to-p-value table is now 404. – james.garriss Dec 04 '15 at 17:49
  • @CrossValidatedTrading Should there be a relationship between the p-values and the D-values from the 2-sided KS test? In some instances, I've seen a proportional relationship, where the D-statistic increases with the p-value. That seems like it would be the opposite: that two curves with a greater difference (larger D-statistic), would be more significantly different (low p-value)... – Thomas Matthew Nov 29 '16 at 00:47
  • if the p value is > 0.05, then your two samples should be identical and balanced. – user798719 Dec 17 '16 at 07:55
  • Yes, but how is the p-value obtained? – Ulf Aslak May 03 '19 at 09:31
  • 1
    What if my KS test statistic is very small or close to 0 but p value is also very close to zero? Are the two samples drawn from the same distribution ? – GadaaDhaariGeek Mar 29 '20 at 05:02
  • @GadaaDhaariGeek if the p-value is smaller than your chosen significance level (e.g., 0.05) then the distributions are 'statistically different'. At the same time however, larger samples will more easily detect even the smallest differences between two distributions. So you could interpret smaller KS test statistics as 'less different' and larger ones as 'more different', roughly speaking. – Amonet May 07 '20 at 13:26
5

When doing a Google search for ks_2samp, the first hit is this website. On it, you can see the function specification:

This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution.

Parameters : 
  a, b : sequence of 1-D ndarrays
  two arrays of sample observations assumed to be drawn from a continuous distribution, sample sizes can be different

Returns :   
  D : float,  KS statistic
  p-value : float, two-tailed p-value
Stijn
  • 1,550
  • 1
  • 12
  • 20