0

I just performed a KS 2 sample test on my distributions, and I obtained the following results:

  • CASE 1: statistic=0.06956521739130435, pvalue=0.9451291140844246;
  • CASE 2: statistic=0.07692307692307693, pvalue=0.9999007347628557;
  • CASE 3: statistic=0.060240963855421686, pvalue=0.9984401671284038.

How can I interpret these results? Do you have some references? For instance, I read the following example: "For an identical distribution, we cannot reject the null hypothesis since the p-value is high, 41%: (0.41)". But who says that the p-value is high enough?

I really appreciate any help you can provide.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
x12red
  • 1
  • 1
    It’s the same deal as when you look at p-values foe the tests that you do know, such as the t-test. – Dave Mar 17 '21 at 14:59
  • 1
    In cases 2 and 3, the p-value is *too* high! That's usually worth further investigation: what is it about your data that would make it plausible for them to be so beautifully close to the hypothesized distribution? (One common answer: the test was misapplied by comparing the data to a distribution custom-fit to the data themselves, in which case the p-values tell us almost nothing and cannot be relied on.) – whuber Mar 17 '21 at 16:13
  • @whuber good point. But here is the 2 sample test. So I don’t think it can be your explanation in brackets – innisfree Mar 17 '21 at 16:22
  • OP, what do you mean your two distributions? You mean your two sets of samples (from two distributions)? – innisfree Mar 17 '21 at 16:23
  • Context: I performed this test on three different galaxy clusters. For each galaxy cluster, I have a photometric catalogue. For each photometric catalogue, I performed a SED fitting considering two different laws. Therefore, for each galaxy cluster, I have two distributions that I want to compare. So, CASE 1 refers to the first galaxy cluster, let's say, etc. A priori, I expect that the KS test returns me the following result: "ehi, the two distributions come from the same parent sample". My only concern is about CASE 1, where the p-value is 0.94, and I do not know if it is a problem or not. – x12red Mar 17 '21 at 16:37
  • I've also read online that we can use the KS statistic value to understand if two distributions come from the same parent sample. In that case, we estimate D(m, n, alpha), and if KS statistic > D(m, n, alpha), we can say that the two distributions do not come from the same parent sample and vice-versa. Where D(m, n, alpha) can be estimated with a formula we can find in Smirnov (1948) – x12red Mar 17 '21 at 16:42
  • That initial fitting is suspect: are you comparing two *datasets* or two *fitted distributions* (or something else)? These p-values remain suspiciously high. (cc @innisfree, whose initial comment is germane.) – whuber Mar 17 '21 at 16:54
  • Testing whether two distributions come from the parent sample?! We should be talking about whether two sets of samples come from the same parent distribution. Something is way off here – innisfree Mar 17 '21 at 17:00

1 Answers1

1

The 2 sample Kolmogorov–Smirnov test of distribution for two different samples. It's testing whether the samples come from the same distribution (Be careful it doesn't have to be normal distribution).

The p value is evidence as pointed in the comments ... against the null hypothesis. More precisly said You reject the null hypothesis that the two samples were drawn from the same distribution if the p-value is less than your significance level. The significance level of p value is usually set at 0.05.

I would reccomend you to simply check wikipedia page of KS test. And also this post Is normality testing 'essentially useless'? which is contributed to testing of normality and usefulness of test as they lose power as the sample size increase.