1

In my understanding, hypothesis tests are equivalent to confidence intervals. Because of this, I initially believed $p$-values should only be reported in a binary sense i.e. is $p<\alpha$ or $p>\alpha$ so you can either reject or fail to reject the null respectively.

But what happens when you do your test at the 5% level and is not significant at 5% level but at the 10% level? That means we are 90% confident that the true $\mu$ lies outside the 90% confidence interval, which seems quite high.

Is it not valid to reject at the 10% level in that case? Maybe I was stringent with the 5% significance.

Also since these cases only occur when $p$-values are between 2 significance levels (at least in the case of symmetric tests), doesn't that mean the size of $p$-values actually matters?

Darby Bond
  • 107
  • 7
  • 1
    Yes, size of p-value does matter. No, you should not change your $\alpha$, which you fixed before the experiment, after the fact, because the results are significant for a different $\alpha$. Also, your interpretation of the confidence interval is wrong. – user2974951 Jan 06 '22 at 09:39
  • What if both $\alpha$'s are acceptable for the scientist? There might not be much difference between 90% and 95% confidence for the application or test at hand. If before I worked with 95% confidence but then I can reject at the 90%, what's the harm? – Darby Bond Jan 06 '22 at 09:43
  • 1
    Your understanding of p-values, $\alpha$'s, and confidence intervals is lacking. This is not a simple difference of 5% in probability. Changing $\alpha$ from 0.05 to 0.1 is **not** about saying that you are 95 or 90% confident that the true value lies inside/outside this range. It is much more nuanced and, some would say (those whose name starts with B) non-intuitive. But to answer your question, what's the harm? The harm is that this procedure is completely arbitrary, subjective, and non-scientific. This is how you get non-reproducible results which are of no use to anyone. – user2974951 Jan 06 '22 at 10:09
  • 1
    I am not a p-value convert and they continue to be poorly used, even misused and abused. IMHO, the worst mistake in using p-values is to classify them into arbitrary significance classes of 10%, 5% or 1% - with anything else being non-significant. The actual p-value should be reported every time so the reader can intepret the meaningfulness (not necessarily the significance) of the results. Having such transparency is why the size of the p-value matters. – Mari153 Jan 06 '22 at 10:11
  • @Mari153 how does the size of p-value show you meaningfulness? Wouldn't you interpret that from a confidence interval? Can you give an example of how the size of the p-value might help you establish meaningfulness? – Darby Bond Jan 06 '22 at 10:35
  • @user2974951 I don't understand why it isn't a difference of 5% in probability. Before the data is measured, a 90% CI has a 90% chance of containing the true parameter and 95% CI has a 95% chance respectively. Can you illustrate where my understanding is off here? – Darby Bond Jan 06 '22 at 10:38
  • 1
    The p-value is just a measure of the effect size in terms of a probability. What you do with it is subjective. Depending on your situation you might create a more objective formulation for getting to a decision (e.g. optimise the expected profit of a certain decision). Say you have to gamble on some rejection (and it costs you money to reject) would it matter whether you have a 5% or a 10% probability of being false? – Sextus Empiricus Jan 06 '22 at 11:20
  • @Darby Bond. A simple thought experiment. Let's say we did a series of experiments on a new treatment - say a new weedkiller. The p-level comes back at between 0.11 and 0.14 for all 20 tests. Typically that would be deemed non-significant - and in some academic approaches, such a result wouldn't even be reported on or published! Yet, as all the experiments have a similar p-value, that would suggest there is some meaning in the treatment. Identifying that meaning, would be the next step... – Mari153 Jan 07 '22 at 00:38

1 Answers1

1

This question or questions very like it come up a lot. Your issue is that there are two distinct and incompatible ways of testing hypotheses using p-values, and many textbooks contain a confusing mix of the two. There is already an excellent explanation of the two approaches in this answer, but I'll summarise it (badly) here:

  • In Fisher's approach, you calculate your p-value, but you don't compare it to .05, .1, or any other threshold. Instead, you just report it as it is, and a smaller p-value means stronger evidence against the null hypothesis.

  • In the Neyman-Pearson approach, you have a clear set of decision rules (e.g. rejecting the null hypothesis if $p < \alpha$), which, if followed correctly, will limit the rate at which you make type 1 and type 2 errors.

So, you can either take Fisher's approach, and worry about the actual p-value, or Neyman and Pearson's, and worry about whether or not $p < \alpha$, but you can't do both.

Eoin
  • 4,543
  • 15
  • 32