8

As the heading says, I'm getting both D statistic and p-value to be low in ks_2samp test. More specificaly:

Ks_2sampResult(statistic=0.049890046265079313, pvalue=0.0011365796735152277)

I think these two results seem kind of contradictory. If the absolute difference between the two CDFs is 0.05, I would say they are mostly the same distribution and it's quite unintuitive and strange for me to see such a low p-value.

The sample size for both of my variables are over 1500. Both of them have range [0,1]. Now, I have found this post.

It seems that the p-value and D statistic both decrease as the size of the sample increases. This creates concerns for me about using this method for testing if two distributions are the same. I would like to hear more opinions about this, as I am pretty convinced now that this should not be trusted in my case. But if it is true that it's misleading here, then why should I trust it in any case?

Sven Hohenstein
  • 6,285
  • 25
  • 30
  • 39
  • Would it be possible for you to add a code snippet to generate the given results? I am facing a similar issue and I'm trying to get to the bottom of it. – Luca Cappelletti Nov 24 '18 at 11:38

0 Answers0