What tradeoffs are there between internal and external validity?

Question

I was reading through an article about research design. The authors gave an example on the relation between internal and external validity which I thought might be an important area of consideration when designing research proposals.

"A restricted study population (and exclusion criteria) may limit bias and increase the internal validity of the study; however, this approach will limit external validity of the study and, thus, the generalizability of the findings to the practical clinical setting. Conversely, a broadly defined study population and inclusion criteria may be representative of practical clinical practice but may increase bias and reduce the internal validity of the study."

Broadly speaking, what questions can be asked to help decide one should strike a balance between the internal and external validity in our research proposal designs? Is it necessary then to carry out a pilot study first to establish some level of internal validity before moving on to establish external validity in a much bigger and formal follow-up study?

Crosspost: Quora

mflo-ByeSE · Accepted Answer · 2016-03-08T18:41:24.530

I've been following your post since it was posted in CogSci and have never had the time to give a full answer: even this will only be a quick push in the (hopefully) right direction.

First, as I always enjoy seminal works, check out Donald Campbell's work defining several designs relationships to internal and external validity (in the references). This has a great quote on the "balance between internal and external validity":

If one is in a situation where either internal validity or representativeness [external validity] must be sacrificed, which should it be? The answer is clear. Internal validity is the prior and indispensable consideration. (Campbell, 1957, p. 310)

Thinking about why: I tend to teach internal validity as "Is my study telling me what I think it is?", while external validity is more along the lines of "Will this apply to other populations?" Using these working definitions, one can't really have "applications to other people" if one doesn't have a study that is telling you what you think it is.

Being a bit more pragmatic: consider your research goals. Is your goal to apply your research in a wide variety of situations, or just one? Is there any evidence that the method is effective in any circumstance (or that the there is a relationship between variables, etc.)? If so, do you have reason to believe it will transfer to a new population (aka generalize)? The answer to these can guide what your first focus should be, but these questions are strongly grounded in theory: without a theoretical backing, go back to the default of establishing internal validity first.

I think that the above answers your second question, but I can't emphasize enough the importance of theory (including methodological theory). In the cited article you'll see that Campbell assigns the strength of internal and external validity to the methodology, not necessarily the data. Not all methods are created equally. This doesn't really have to do with a "pilot test": if you use control groups, have random assignment, do proper sampling schemes, etc., your study will have better internal and external validity. Again, this is by the nature of the method, not because it has been piloted.

It may help to see the two very broad methods (in social research) that we use to increase internal and external validity:

To increase internal validity, we tend to use random assignment to treatment and control groups (discussed in Campbell's paper). The idea is that random assignment randomly distributes all sorts of confounding variables that you haven't accounted for among both groups, so they should be equally balanced (or, if they're not, it was by random chance).
To increase external validity, we tend to use random sampling to collect our participants. The idea is that random sampling grabs a representative sample of the population you are sampling from, and that if the treatment "works" for the representative sample, it should work for the rest of the population.

Note that these can both occur at the same time, and that the practice of one does not preclude the other. Also note that research can still have internal and external validity without doing these, though you'd have to make a strong argument as to why.

As a final note, let me add this: you don't ever "have" internal or external validity, you merely have evidence supporting internal/external validity. Most (if not all) types of validity are just a body of evidence in favor of the concept: for internal validity, a body of evidence that only your proposed treatment influenced your outcome variable; for external validity, a body of evidence that your proposed treatment would influence the outcome variable for other samples/populations. Any ideas that you can think of to show these are contributions to the evidence of the internal/external validity of your study.

References

Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54(4), 297–312.

Dirk Horsten · Answer 2 · 2016-01-09T23:59:30.830

I appreciate articles in which as well the limited set of clean data as the broader set of available data are analysed. Those conclusions that emerge from both sets are extremely valid, of cource.

The reason to do the research (the reason why someone pays you) determines the required external validity. If you work for a local hospital, for instance, then there is no problem limiting to the most frequent etnico group. If you work fo an international organisation and only studie one ethnique groep, you MUST try to be compatinle with other studies in other regions, so that your results can be includerd in a revieuw article.

What tradeoffs are there between internal and external validity?

2 Answers2