4

I've been reading a number of articles and blog posts about issues relating to hypothesis testing. While these sources seem to put forth legit problems with hypothesis testing and it's interpretations, I'm not seeing anything in terms of alternatives to it (beyond bayesian hypothesis testing). What are some alternative to hypothesis testing in R?

Problems with the hypothesis testing approach

So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

Alternatives to Statistical Hypothesis Testing

EDIT:

For example, let's consider the following situation. We have monthly data for one year where conversions_a is the control and conversions_b is the experimental data for conversion when using a different headline.

df = data.frame(year_one_a = 1:12, conversions_a = rnorm(12), 
                year_one_b = 1:12, conversions_b = rnorm(12)+5)
amathew
  • 449
  • 6
  • 10
  • 3
    In many cases, intervals (which show effect sizes) are more meaningful that hypothesis tests. – Glen_b Feb 01 '14 at 13:10
  • 2
    I think the question needs some context to be answerable. As @Glen says, confidence intervals around a parameter estimate are often more useful than a test that the parameter's true value is exactly zero, but what kind of situations are you thinking of? – Scortchi - Reinstate Monica Feb 01 '14 at 13:20
  • I wasn't thinking of a specific situation, just looking to mimic the following question but for hypothesis testing. http://stats.stackexchange.com/questions/2234/alternatives-to-logistic-regression-in-r – amathew Feb 01 '14 at 13:26
  • 1
    I think it depends on the kind of experiment you're doing, i.e. observational v/s experimental, and the kind of questions you're asking. Is there any kind of particular setting you're interested in? – Stijn Feb 01 '14 at 13:29
  • possible duplicate of [Effect size as the hypothesis for significance testing](http://stats.stackexchange.com/questions/23617/effect-size-as-the-hypothesis-for-significance-testing) – gung - Reinstate Monica Feb 01 '14 at 14:33
  • I am not sure why several people think this is unclear; @amathew is asking for alternatives to hypothesis testing. This might be too broad, or opinion based, but I think it's perfectly clear. – Peter Flom Feb 01 '14 at 15:32
  • @Peter: I thought I voted to put on hold as too broad, but perhaps I ticked the wrong box. And it is very broad; after all there's not an R function `hyp.test` you can replace with something else. If his focus is on model selection (as suggested by the third link. which is about AIC & model averaging), then your answer isn't as relevant as it might be to another situation - say hypothesis tests that regression coefficients are equal to zero. – Scortchi - Reinstate Monica Feb 01 '14 at 17:49
  • @Peter I voted to close as unclear because I could think of way too many different interpretations of "alternative." Would it mean (a) unconventional procedures for conducting hypothesis testing in the N-P framework; (b) a different framework for evaluating evidence against a null; (c) a different logical and statistical framework for formulating testable hypotheses; (d) abandoning a testing framework altogether in favor of some other approach (like EDA); (e) other possible interpretations of results in any of (a)-(d); or something else? – whuber Feb 01 '14 at 20:42
  • @PeterFlom, I voted to close as a duplicate because based on what I believe is the correct interpretation of this question (although I think whuber has a valid point), the answer is Glen_b's comment, which is discussed in the potential duplicate I linked to. I think your & DLDahly's answers were worthwhile, even though they don't answer the Q as I understand it. I upvoted both of them & think they should stay; I don't think this Q should be merged w/ the other thread. – gung - Reinstate Monica Feb 02 '14 at 02:36
  • @gung Merging the two threads seems sensible; that way, all the good answers (in this thread and the other) stay, and they are in one place. I don't remember how to merge two questions though; do you? – Peter Flom Feb 02 '14 at 13:26
  • The "So-Called Bayesian" editorial is incorrect. It assumes that Bayesians are interested in point null hypotheses, which is seldom the case. Much more useful is the probability that an effect is greater than zero. – Frank Harrell Feb 02 '14 at 13:59
  • @PeterFlom, I don't think merging is ideal here, as the answers below wouldn't be a perfect fit under that question as it is stated. However, if you want to do it, merging is a task only moderators can perform; I don't know how it's done, you'd have to ask the other mods. – gung - Reinstate Monica Feb 02 '14 at 14:30

3 Answers3

9

One alternative is to forgo p-values altogether and focus on what the results mean. In many situations, p-values answer a question we are not (or ought not be) interested in:

If, in the population from which this sample was drawn, there is no effect, how likely is it that, in a sample of this size, we would get a test statistic as big or bigger than the one we got?

Instead, we ought to be interested in what the statistics add to an argument about what is going on. This idea is fully developed in the book "Statistics As Principled Argument" by Robert Abelson (link goes to my review) but, essentially, we ought to be asking these questions about the effects we find:

  1. How big are they?
  2. How precise are they?
  3. How widely do they apply?
  4. How interesting are they?
  5. How credible are they?

This can only be done if we are also substantive experts or if we work closely with experts. For example, sometimes small effects are quite interesting - so interesting that they are worth discussing. Indeed, sometimes effects are interesting because they are small - if the literature and theory suggests that they ought to be big.

Highly credible claims require less evidence than ones that are not credible, but credibility has non-statistical sources.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • 1
    +1, this is the best statement of this point of view that I've seen from you on CV. – gung - Reinstate Monica Feb 01 '14 at 15:10
  • +1 Excellent suggestions! Of course, one need not forgo p-values in order to consider the answers to your questions. Instead one has to forgo the silly binary determination of 'significant' versus 'not significant'. – Michael Lew Feb 01 '14 at 21:14
  • Attempts to face this issue without making clear the distinction between hypothesis tests and significance tests lead to throwing out the baby with the bathwater (p-values are the baby, hypothesis tests are the bathwater). – Michael Lew Feb 01 '14 at 21:21
  • Then, @MichaelLew how do you propose testing hypotheses? I'd be eager to read it (would probably need to be a separate answer, not a comment). – Peter Flom Feb 01 '14 at 21:25
  • When would you ever want a statistical test to dictate your actions in a manner that excludes addressing the types of questions that you list? If you, like I, would answer nearly never then you nearly never need a hypothesis test. For people interested in evidence the p-value from a significance test can be interesting, but the dichotomous result of a hypothesis test is not. I've addressed these issues in an extensive form in this paper: http://www.ncbi.nlm.nih.gov/pubmed/22394284. – Michael Lew Feb 01 '14 at 23:21
  • I should also add a link to a paper of mine that details the relationship between significance test p-values and likelihood functions, as well as debunking many arguments about alleged flaws and inadequacies of p-values. (The paper has now been three times rejected but its arguments and evidence have not been refuted. Who would have thought that there would be unreasonable resistance to the notion that p-values are actually used because they are useful?) To P or not to P: on the evidential nature of P-values and their place in scientific inference: http://arxiv.org/abs/1311.0081 – Michael Lew Feb 01 '14 at 23:26
  • @PeterFlom Did you see my papers that you might have been "eager to read"? (I forgot to put your handle in the comments.) – Michael Lew Feb 03 '14 at 21:47
  • I saw your note. I've read quite a lot on p values. – Peter Flom Feb 03 '14 at 23:20
4

This is subjective, and doesn't directly address your question, but I hope it's helpful regardless.

I think people too often conflate a statistical hypothesis with a scientific one, which leads to problems. Regarding the former, the statistical hypothesis being tested is almost always whether the parameter being estimated is zero or not. But with any deeper thought on the matter, one realizes this is often trivial, for all the reasons Peter stated. But because it's called a hypothesis, and it is falsifiable, it seems to satisfy people that something scientific has been achieved.

It also tends to reduced a complex scientific question to the reporting of one key statistical hypothesis. This is further encouraged by how a lot of scientific research is published - thin-slice the bigger problem and report every slice/p-value in a different paper.

Where I am going is this, with respect to your question? I am not offended by statistical hypothesis tests per se, as long as they are applied thoughtfully. In other words, the solution is not just about finding a better statistical approach, but rather about applying a better scientific approach; by clearly specifying a complete theory that can be tested, collecting the relevant data, and then ensuring any statistical modelling or testing directly follows from these, rather than the other way around.

Michael Lew
  • 10,995
  • 2
  • 29
  • 47
D L Dahly
  • 3,663
  • 1
  • 24
  • 51
0

Following the ideas put forward here http://robjhyndman.com/working-papers/forecasting-without-significance-tests/, you can formulate two models: with and without a dummy variable. Then you can use AIC-like statistics (and start reading about their own problems and misunderstandings).

skulker
  • 1,268
  • 2
  • 9
  • 6