6

In a sample size of 100, we identified the existence of two attributes A and B.

Our goal is to assess whether there is any association between these two attributes. The data looks like following:

                    A
            Present    Absent|
  Present      x1         x2 |
B                            |
   Absent      x3         x4 |
 ----------------------------|---
                               100

Since only the total sample size was fixed here, we conducted "Boschloo's exact test with a multinomial model".

Attribute A can be divided into two parts, pathogenic A and non-pathogenic A. Now, with the same sample of 100, we test whether there is any association between attribute B and pathogenic A. Since here the margin of attribute B is fixed, we conducted "Boschloo's exact test with a binomial model".

Again, we assessed whether there is any association between attribute B and non-pathogenic A. Here we also used "Boschloo's exact test with a binomial model" as a test procedure.

My question:

In a same study, we are conducting 3 different inferences with the same sample of 100 . Is it valid to performing several tests to draw conclusions of several inferences with the same sample (data)?

rolando2
  • 11,645
  • 1
  • 39
  • 60
user81411
  • 731
  • 1
  • 7
  • 14
  • Are you requiring that all the tests met your inference threshold, or are you willing to have any fail? – ReneBt Jun 07 '18 at 03:38
  • 3
    Is this a retrospective study/analysis? So first data collection, then the research question formulation? – Michael M Jun 07 '18 at 17:58
  • @MichaelM it's a prospective study. We first formulated principal research question and then collected data. But the other research questions arose after the data collection. That is, while investigating the principal research question, we noticed some other features and tested them. – user81411 Jun 08 '18 at 07:52

2 Answers2

1

The answer is yet tentative; I'll add to it -- or remove it -- later.

In principle you can extract as many different conclusions from your data as you want. This includes hypotheses and also inferences. You will notice however, that these conclusions might overlap or even contradict each other. You could argue that is especially then the case, if the statistical power is insufficient to draw a certain conclusion conclusively.

But it would be a severe error, if you were using the same data to train, test and/or validate some extraction or refining method. But this might or might not be the case here. You have a notion that there might be some feature present and you test for this feature. This test can be implemented in a lot of ways. The questions (i) "feature A is present" and (ii) "feature A is not present" are not the same; if you find that you have data to support (i), you still might not be able to reject (ii).

Barnard's test statistic, including its refinement of Buschloo, are the best ways to do this testing, afaik.

cherub
  • 2,038
  • 7
  • 17
  • Could you please explain a little bit that why have you said "But it would be a severe error, if you were using the same data to train, test and/or validate some extraction or refining method."? – user81411 Jun 08 '18 at 07:57
1

First, in my understanding and in principle, testing a set of different predefined hypothesis on a given data set is a valid procedure.

However, it seems that your problematic is related to a set of none-predefined hypothesis and in my understanding, the very nature of your question is about what do you mean by "draw conclusions". As you mentioned in the comment, your hypothesis were not planned (or at least a part of them). Consequently, your analysis will be at best purely explanatory and drawing definitive conclusions is out of your scope. I suggest you this question and associated answers discussing about why this is the case. A brief summary could be: there is too much degree of freedom in a data set to draw conclusion from hypothesis generated after having see the data.

Nevertheless,documenting and discussing the effect sizes of side-observations is relevant and useful. Just be aware and make your readers aware that these are observations needing to be tested properly (but that still may served a reasoned discussion).

peuhp
  • 4,622
  • 20
  • 38
  • If the post defined hypothesis found the data in a state that we would like to collect if the hypothesis were predefined, then can't we test them by inferential procedure? – user81411 Jun 08 '18 at 11:51
  • Could you please provide me an authorative reference in support of your statement that *"in principle, testing a set of different predefined hypothesis on a given data set is a valid procedure."*? – user81411 Jun 08 '18 at 11:53
  • Regarding your first question, you can't draw strong conclusion if your hyp. were not predefined (unless your p-values were extremely low so that family-wise type I error may be substantially accounted for). Pragmatically, you can still compute some p-values if you want to but their values will not be indicative of anything ; and so must be interpreted with cautions. Hope it helps. – peuhp Jun 08 '18 at 12:13