11

You so often come across in the press various studies that conclude directionally opposite results. Those can be related to the testing of a new prescription drug or the merit of a specific nutrient or anything else for that matter.

When two such studies arrive at conflicting results how can you tell which one of the two is closest to the truth?

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
Sympa
  • 6,862
  • 3
  • 30
  • 56
  • Maybe this should be CW? There will not be a unique answer to this question and multiple perspectives and approaches might emerge. – whuber Sep 26 '10 at 20:42
  • 2
    @whuber I would vote against CW because even if there are different perspectives there is likely to be one *best* approach. This is similar to how the same hypothesis can be tested using different frameworks/models but there is likely to be one best approach. –  Sep 26 '10 at 22:17
  • @Srikant: In any particular case I can imagine you could amass a strong defense to support your assertion. In general, though--which is the present situation--the best answer will depend on the context. As a simple (and incomplete) example, contemplate the differences between evaluating a pair of designed physical experiments (such as measuring the speed of light, where historically most of the confidence intervals have missed the truth!) and an observational study in the social sciences. – whuber Sep 27 '10 at 15:03
  • @whuber Perhaps, we should continue this conversation on meta. I admit that I am still fuzzy about when to use CW and when not to but to take up your point: the very best answer to this question would then be that the answer is context dependent and explain why via a few examples. In any case, I somehow feel that this question should not be CW but I am unable to articulate any more reasons beyond the ones I have outlined above. –  Sep 27 '10 at 15:55

3 Answers3

8

The meta analysis literature is relevant to your question. Using meta-analytic techniques you could generate an estimate of the effect of interest pooled across studies. Such techniques often weight studies in terms of their sample size.

Within the meta analysis context researchers talk about fixed effect and random effect models (see Hunter and Schmidt, 2002). A fixed effect model assumes that all studies are estimating the same population effect. A random-effects model assumes that studies differ in the population effect that is being estimated. A random-effects model is typically more appropriate.

As more studies accumulate looking at a particular relationship, more sophisticated approaches become possible. For example, you can code studies in terms of various properties, such as perceived quality, and then examine empirically whether the effect size varies with these study characteristics. Beyond quality there may be some theoretically relevant differences between the studies which would moderate the relationship (e.g., characteristic of the sample, dosage levels, etc.).

In general, I tend to trust studies with:

  • bigger sample sizes
  • greater methodological rigour
  • a confirmatory orientation (e.g., not a study where they tested for correlations between 100 different nutrients and 50 health outcomes)
  • absence of conflict of interest (e.g., not by a company with a commercial interest in showing a relationship; not by a researcher who has an incentive to find a significant result)

But that said you need to keep random sampling and theoretically meaningful differences between studies as a plausible explanation of conflicting study findings.

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
  • I particularly like the likelihood ratio as a means of aggregating evidence in meta-analysis; if you have sufficient data to compute them for each study, you simply compute the product across studies to represent the aggregate evidence for/against a hypothesis. – Mike Lawrence Sep 26 '10 at 14:36
  • I commented on the (ir)relevance of meta analysis after Cyrus's answer, but upvoted this response for everything else, especially the bullet points. – whuber Sep 26 '10 at 20:40
  • @whuber @Gaetan's question assumes that one study is closer to the truth. I try to take a step back and situate variations in results between studies within a meta-analytic framework, acknowledging the possibility that the studies may be of equal quality, but that random sampling or substantive differences may be the explanation. – Jeromy Anglim Sep 27 '10 at 02:43
  • @whuber Even with two-studies it would be possible to form a meta-analytic estimate of the effect of interest. Of course, the confidence interval of the estimate of effect may be large. But a high degree of uncertainty is to be expected if only two studies have been conducted and they are giving conflicting results. – Jeromy Anglim Sep 27 '10 at 02:44
5

I would hold off on considering meta-analysis until you've scrutinized sources if potential bias or variation in the target populations. If these are studies of treatment effects, was treatment randomly assigned? Were there deviations from the protocol? Was there noncompliance? Is there missing outcome data? Were the samples drawn from the same frame? Was there refusal to participate? Implementation errors? Were standard errors computed correctly, accounting for clustering and robust to various parametric assumptions? Only after you have answered these questions do I think meta-analysis issues start to enter the picture. It must be rare that for any two studies meta-analysis is appropriate, unless you are willing to make sone heroic assumptions.

Cyrus
  • 51
  • 1
  • But aren't these steps already part of meta-analysis? – chl Sep 26 '10 at 17:14
  • 3
    @chl: True, but the point is that these steps get to the essence of the question. A meta-analysis would be helpful only when there are many studies (not just two) and their merits have already been carefully evaluated. The question before us is really asking how one goes about evaluating the quality of a study, or pair of conflicting studies, in the first place. Cyrus has pointed to some of the many aspects of this; a reasonable treatment usually requires one or two semesters of university-level study. In this light I think his use of the term "heroic" is somewhat understated! – whuber Sep 26 '10 at 20:38
  • 1
    @whuber Yes, I agree with you and @Cyrus. Of course, assessing the quality and trustiness of previous studies is a mandatory step (and it takes time to review every studies, especially when we have to contact authors because informations are missing in the MS); I just thought this was part of the meta-analysis, and the "statistical part" reduces to bringing a quantitative summary of trustworthy results. – chl Sep 26 '10 at 20:50
3

I think Jeromy's answer is sufficient if you are examining two experimental studies or an actual meta-analysis. But often times we are faced with examining two non-experimental studies, and are tasked with assessing the validity of those two disparate findings.

As Cyrus's grocery list of questions suggests, the topic itself is not amenable to short response, and whole books are in essence aimed to address such a question. For anyone interested in conducting research on non-experimental data, I would highly suggest you read

Experimental and quasi-experimental designs for generalized causal inference by William R. Shadish, Thomas D. Cook, Donald Thomas Campbell (Also I have heard that the older versions of this text are just as good).

Several items Jeromy referred to (bigger sample sizes, and greater methodological rigour), and everything that Cyrus mentions would be considered what Campbell and Cook refer to as "Internal Validity". These include aspects of the research design and the statistical methods used to assess the relationship between X and Y. In particular as critics we are concerned about aspects of either that could bias the results, and diminish the reliability of the findings. As this is a forum devoted to statistical analysis, much of the answers are centered around statistical methods to ensure unbiased estimates of whatever relationship you are assessing. But their are other aspects of the research design unrelated to statistical analysis that diminish the validity of the findings no matter what rigourous lengths one goes to in their statistical analysis (such as Cyrus's mention of several aspects of experiment fidelity can be addressed but not solved with statistical methods, and if they occur will always diminish the validity of the studies results). There are many other aspects of internal validity that become crucial to assess in comparing results of non-experimental studies that are not mentioned here, and aspects of research designs that can distinguish reliability of findings. I don't think it is quite appropriate to go into too much detail here, but I would often take the results of a quasi-experimental study (such as an interrupted time series or a matched case-control) more seriously than I would a study that is not quasi experimental, regardless of the other aspects Jeromy or Cyrus mentioned (of course within some reason).

Campbell and Cook also refer to the "external validity" of studies. This aspect of research design is often much smaller in scope, and does not deserve as much attention as internal validity. External validity essentially deals with the generalizability of the findings, and I would say laymen can often assess external validity reasonably well as long as they are familiar with the subject. Long story short read Shadish's, Cook's and Campbell's book.

Andy W
  • 15,245
  • 8
  • 69
  • 191