Paper on performing hypothesis tests based on outcome of another test

Question

It is well known that it is problematic to choose a statistical test based on the outcome of another statistical test, as the p-values are difficult to impossible to interpret (e.g. Choosing a statistical test based on the outcome of another (e.g. normality)). However, this is still standard practice in many applications and usually does not seem to be noticed or discussed in applied papers. Looking through the literature, I failed to find a paper that actually discusses this phenomenon.

I would appreciate links to any publications relating to choosing a statistical test based on the outcome of another statistical test, especially any that are accessible to applied scientists.

Unrelated comment: In my search, I stumbled across a paper by RS Nickerson '[Null hypothesis significance testing: a review of an old and continuing controversy](http://psych.colorado.edu/~willcutt/pdfs/Nickerson_2000.pdf)', which does not discuss this particular phenomenon, but also seemed nice to give to applied scientists. — Rob Hall, Oct 09 '13 at 12:54
Long ago I posted a detailed analysis of one such situation at http://www.quantdec.com/envstats/notes/class_12/ucl.htm: it studies the properties of a UCL that is determined by a procedure chosen conditional on the results of a preliminary hypothesis test (concerning the underlying distribution). — whuber, Oct 10 '13 at 16:50
[This one](http://www.ncbi.nlm.nih.gov/pubmed/15171807) might interest you (also see [this](http://andrewgelman.com/2013/06/09/heterogeneity-of-variance-in-experimental-studies-a-challenge-to-conventional-interpretations/)). Then there's [this](http://beheco.oxfordjournals.org/content/17/4/688.full). That relates to testing equality of variance and testing normality respectively before a two-sample t-test. — Glen_b, Oct 11 '13 at 05:19
It seems to me that to simply say that "It is well known that it is problematic" is to provide insufficient specificity because the problematic nature probably depends on the statistical framework within which one is working. Problems for frequentist interpretation may not be problems for methods that assess the evidential meaning of data. — Michael Lew, Dec 07 '13 at 06:47
Maybe a simple example where this is problematic would serve much the same purpose as a citation. — BKay, Oct 07 '14 at 15:40

score 1 · Answer 1 · answered Oct 26 '14 at 07:40

1

I think that the following research paper on chain procedures is relevant to and might be helpful in answering your question: http://www.multxpert.com/doc/md2011.pdf.

answered Oct 26 '14 at 07:40

Aleksandr Blekh

7,867
2
27
93

1

Thank you for the interesting paper. If I understand it correctly, it looks at data-driven allocation of alpha to an ordered set of hypotheses. Such a procedure could then simply add all variants of a hypothesis to a list (e.g. the hypothesis given that a paremetric test can be used and the hypothesis given that there is some evidence that a non-paremetric test should be used). While this should a be sensible approach in a Neyman–Pearson framework, I am not sure this solves the problem of interpreting the p-values in the sense of Fisher. – Rob Hall Nov 26 '14 at 00:26
1

@RobHall: You're very welcome! Frankly, I browsed the paper without delving into details, so at the present time I can't really make a worthwhile comment. But I hope to review this paper when I will have a chance. By the way, here is another interesting paper that might be relevant to this, where author argues that the two frameworks can be considered complementary: https://stat.duke.edu/courses/Spring07/sta215/Ref/Lehm1993.pdf. – Aleksandr Blekh Nov 26 '14 at 00:53

Paper on performing hypothesis tests based on outcome of another test

1 Answers1

Linked