4

Does it make sense to use the differences in p value to show a tendency or the 'importance' of the effect of a treatment. for example, I have treated a contaminated soil and I test the treatments against one control. I can see the high molecular weight contaminants are more affected by the treatment and that the p values (testing treatment against control) for these contaminants are lower than the ones of the low molecular weight ones, which if im right would mean that the significance is higher.

This seems to correlate with the graph, but could this difference in significance seen through the p values be used in itself to show the difference in the impact of the treatment, I mean in the discussion of your results.

Thank you in advance.

gips
  • 125
  • 1
  • 1
  • 4
  • 5
    p-values are uniformly distributed under the null hypothesis, therefore they should not be used as a measure of the effectiveness of a treatment. The quantity $P(X –  Jul 24 '12 at 09:52
  • 1
    I believe the uniformity of the distribution of p-values is not relevant, @Procrastinator, although your conclusions are good ones. After all, we could monotonically re-express the effect size, which is arguably a measure of effectiveness, so that (under the null) it would have a uniform distribution. I think the key idea is that the p-value depends on phenomena ancillary to the effect size: specifically, on *variability* of estimated *errors* and on *sample size.* That precludes equating significance with importance. – whuber Jul 24 '12 at 13:49

3 Answers3

5

Let me paraphrase your question: "I statistically tested a hypothesis and obtained some p-values. Can I use these p-values to evaluate a different hypothesis?" I don't think that is a good idea. You are asking two different scientific questions. Consequently, you'll want to use two different tests to evaluate the two different hypotheses.

The first question is: "Does the treatment affect any contaminants" Here, you're comparing the different treatments against the control. Most likely, you have calculated a slope of some sort and shown that it is different from zero. This is where you got your p-values. However, these p-values do not address the second question (see below), and I suggest you do not use them to answer the second question.

The second question is: "Is contaminant A (high mw) affected more strongly than contaminant B (low mw)?". For this, I suggest you test whether the effect of treatment on the contaminants significantly differs. Note that you should use a fair normalization, for example use %reduction, rather than reduction in mass such that the effects you measure (e.g. slopes) can be compared at all. Using the estimated slopes and their uncertainties will allow you to perform a test (e.g. t-test) for non-zero difference.

Jonas
  • 1,578
  • 1
  • 13
  • 16
  • but it is not really about treatment A and B here, I actually did a multivariate analysis and I see that for one part of my variables there is a higher effect of a treatment than for the other part of my variables.. so think the only way would be to estimate the differences as @Procrastinator said.. or? – gips Jul 24 '12 at 10:28
  • @gips: I have indeed misunderstood your question, and have fixed my answer. – Jonas Jul 24 '12 at 11:45
  • But I thought the main issue was using p-values for comparisons? I don't think Jonas is addressing that at all. – Michael R. Chernick Jul 24 '12 at 13:16
  • @MichaelChernick: Indeed I didn't. Now I do. – Jonas Jul 24 '12 at 13:27
  • @jonas Okay I see that now. The answer to the second question is nice. I think you created that question not the OP. – Michael R. Chernick Jul 24 '12 at 14:12
  • @Jonas Or maybe it came out of one of the OPs comment. – Michael R. Chernick Jul 24 '12 at 14:17
  • @MichaelChernick: I should have introduced my answer with "In your research..." or something like that. The OP was asking one question for which they got a p-value. Now they're asking a second question, and wonder whether they should re-use the p-values from q#1 for that. I think they shouldn't – Jonas Jul 24 '12 at 14:25
  • @Jonas yes that was exactly what I meant. I didn't phrase the second question, but how MichaelChernick got it was right.thanks toall – gips Jul 25 '12 at 09:49
4

The usual way would be to measure the treatment effect as the difference of both effects: $\beta_{Treated}$ - $\beta_{control}\equiv \theta$. In your case, you need to calculate $\theta$ and then test its significance.

The difference in p-values may eventually give you some elements for discussion (or rather speculation ?) but the discussion will become less rigorous as the hypothesis testing itself suffices. An straight measure of the treatment effect differential is given by the statistic behind $\theta$ or by $\hat\theta$ itself e.g. if the hypothesis testing is based on an statistic that folows a t-student distribution $T_n\sim t_q$ then it's easy to argue that the statistic proportional to the estimated difference $\hat \theta$.

It seems that you have treated two soils ? and that you want to asses the RELATIVE effectivity by comparing both tests on each soil? If that's the case a test : $H_0: \theta_1 -\theta_2 \neq 0 $ or $\hat\theta_1-\hat\theta_2$ would provide you with an objective measure and hypothesis testing of the relative performance.

JDav
  • 751
  • 4
  • 8
  • i have treated one soil which contains different contaminants and analyzed the remaining concentration of all these different contaminants. and these contaminants according to their weigh show to have different behaviours. but thanks for ur answer! I indeed want the relative effectivity of the treatment but on different groups of variables or contaminants – gips Jul 24 '12 at 11:49
  • @jDav The treatment effect could be as higher than the specified value but detecting it stille requires statistical power which means a sufficient sample size. – Michael R. Chernick Jul 24 '12 at 14:16
  • @MichaelChernick Thanks I agree! nevertheless I have to admit that I'm losing the point somewhere as I can't see where in the question sample size seems to be an issue. gips allows for hypothesis testing which implies a certain statistic and asymptotic or exact distribution is assumed, thus this is not an issue... unless there is something I have not read. – JDav Jul 24 '12 at 22:53
  • @JDav It is not that the test statistic under the null hypothesis is right it is the issue of power. Any sample size can give you a test staistic with the right type 1 error if it is an exact test. If it is asymptotic then the significance level does depend a little on having the sample size large enough that the asymptotic null distribution is close to the true null. But power is the key that you may be missing. If you get an insignificant p-value it cpuld be due to a type II error. Keeping the type II error down requires an adequately large sample size. – Michael R. Chernick Jul 24 '12 at 23:41
  • That is why we look at effect size and power to do sample size determination! – Michael R. Chernick Jul 24 '12 at 23:41
3

I think one of the key reasons why you usually can't do what you are suggesting is sample size vs variability. A p-value tells you whether or not the available data can tell you that you have detected a difference. It does not directly tell you the magnitude of the difference. You can have correctly selected the hypothesis test and have a situation where the null hypothesis should be rejected but your sample size is too small to detect the actual effect size. This could mean that you need a very large sample because of large random error or that the chosen sample size turned out to be too small. When you try to compare p-values from different studies the differences in sample size muddies the comparison. Of course I agree with Procrastinator's comments as well.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143