Test for difference between 2 bounded, paired, and integer valued discrete distributions

Question

I am comparing customer ratings at origination and at default. My data looks like this

Customer ID     Rating at Origination      Rating at Default
     1                    7                     6
     2                    4                    13
     3                    6                    12
...

I have 483 unique customers, and a rating is an integer between 1 and 15. I would like to know whether the ratings at origination and those at default are from the same distribution.

1) The K-S Test

I am not sure whether the ties will invalidate the result of the K-S test. Nevertheless, the result of the K-S test is

ks.test(data$ORIG_RATING,data$DEF_RATING)

Two-sample Kolmogorov-Smirnov test

data:  data$ORIG_RATING and data$DEF_RATING
D = 0.072464, p-value = 0.1582
alternative hypothesis: two-sided

Warning message: p-value will be approximate in the presence of ties

According to the accepted answer of this question, K-S test can be used when the ties are not heavy. How can I tell whether my ties are heavy or not?

2) The Chi Square Test

chisq.test(data$ORIG_RATING, data$DEF_RATING)

Pearson's Chi-squared test

data:  data$ORIG_RATING and data$DEF_RATING
X-squared = 2287, df = 99, p-value < 2.2e-16

Warning message: Chi-squared approximation may be incorrect

3) Permutation Tests

According to the accepted answer of this question,

install.packages("afex")
library(afex)
compare.2.vectors(data$ORIG_RATING,data$DEF_RATING)

$parametric
   test test.statistic test.value  test.df          p
1     t              t  -2.042615 964.0000 0.04136216
2 Welch              t  -2.042615 957.5903 0.04136398

$nonparametric
             test test.statistic    test.value test.df          p
1 stats::Wilcoxon              W 108548.500000      NA 0.05839797
2     permutation              Z     -2.039266      NA 0.04277000
3  coin::Wilcoxon              Z     -1.892815      NA 0.05724000
4          median              Z     -1.029688      NA 0.33503000

These tests seem to reveal that my result is really at the border, but which one should I trust?

My Question: Can the fact that my observations are paired and bounded integers be used to make such a comparison more accurate? Are there any tests designed specifically for data like this?

Test for difference between 2 bounded, paired, and integer valued discrete distributions

0 Answers0