3

I need to compare two samples and I would like to know if the distribution of the first stochastically dominates that of the second one. I'm not sure whether with the Chi-Square test I can verify this.

In scipy and R, there is no reference to this test being one-sided, while a previous question on this site says that it's always one-sided.

Could anybody clarify this, with respect to either of the two implementations linked above?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Ricky Robinson
  • 469
  • 5
  • 16
  • 1
    What do you mean by "distribution being larger"? – Tim Jan 19 '15 at 17:11
  • Roughly the same as stated here for kolmogorov smirnov: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html "In the one-sided test, the alternative is that the empirical cumulative distribution function of the random variable is “less” or “greater” than the cumulative distribution function F(x) of the hypothesis, G(x)<=F(x), resp. G(x)>=F(x)." – Ricky Robinson Jan 19 '15 at 17:17
  • 2
    The chi-square isn't a test for stochastic dominance. – Glen_b Jan 19 '15 at 17:42
  • 2
    The CRITICAL REGION for the chi-square statistic is one-sided, but the HYPOTHESES are two-sided or multi-sided, in the sense that the statistic looks to see if one set of frequencies is discrepant from another, without regard to what direction they differ. So the chi-square stat is not really very well suited for your stochastic-ordering alternative. There is literature on order-restricted inference for situations like this, but it gets pretty technical. +1 for the question from here, don't understand what somebody sees wrong with it. – Russ Lenth Jan 20 '15 at 04:02

1 Answers1

2

A major reference on this topic is Tim Robertson, F. T. Wright, R. L. Dykstra (1988), Order Restricted Statistical Inference, published by Wiley. It is expensive and technical, but it is still probably the best place to look. An earlier reference is Richard E. Barlow (1972), Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. Perhaps you can find something easier to digest online, by searching for "order-restricted inference" or "isotonic regression."

For the chi-square application you mention, the basic idea, as I understand it, is to compute the chi-square statistic but with some twists: First, the "expected" counts should be obtained based on the stochastic-ordering requirement -- this is done using isotonic-regression techniques, which involves pooling adjacent categories when the observed ordering is violated. Second, the significance is assessed using the "chi-bar-square" distribution, which is the distribution of a mixture of chi-squares with different d.f. This is necessary because under the null hypothesis that there is no difference between the distributions, there are various possibilities for which categories need to be pooled in performing the isotonic regression.

This is not a complete answer to your question, but it does point to the basic ideas. Note also that this definitely does say that the usual chi-square test is not suitable for the stochastically-ordered hypotheses you wish to test. Perhaps you can consult someone having the expertise to be able to navigate the techniques needed to get this done with your data. Good luck!

Russ Lenth
  • 15,161
  • 20
  • 53