Examples of "Textbook" Retrospective Studies

Question

I am a physician and have never had any formal training in statistics/biostatistics. However every time I come across a journal article, specifically on retrospective studies, I cannot help but feel a little suspicious that (a) the authors are trying to pull a fast one on me and/or (b) they are like myself and don't really understand what they are doing as well.

So I was wondering if there are any retrospective studies, hopefully in medicine, that are considered to be "classics" or are stellar examples of logical thinking and sound applications of statistical principles.

I know that the knee-jerk response might be to tell me to read textbook A or B, but unfortunately I don't have that kind of luxury at this point. I also do not have the luxury of quick access to a biostatistician to ask these kinds of questions at my current institution.

-- EDIT --

In response to some of the answers below, all of which are much appreciated, let me add the following:

I am a surgeon, so I am interested in diseases pertaining to the field of surgery. AdamO suggested that I look at the NEJM, JAMA, or the Lancet, which I did. One of the studies that I found interesting but am somewhat skeptical of is: "Risk Factors for Retained Instruments and Sponges after Surgery" by Atul A. Gawande et al, published in the NEJM, January 2003.

Now Dr. Gawande is something of a rock-star in general surgery, and this is one of the reasons why I chose to look at his paper after getting the suggestion from AdamO. His paper is published in the NEJM, so he must be doing something good when designing his study.

He is trying to see what are risk factors for surgeons accidentally leaving things inside patients. This is a (thankfully) very rare occurrence: scouring through a Massachusetts malpractice insurance company's records from 1985 to 2001, he found only 54 cases.

However, his design leaves me somewhat confused and unhappy. For instance he needs to compare 54 cases with a risk-adjusted cohort that did not have retained objects, which he will treat as control. He writes "Given an estimated 60 cases available for review, we determined that four controls for each case would give the study sufficient power to detect a risk factor present in 30 percent of patients that produced a doubling of the likelihood that a foreign body would be left behind." Most likely some biostatistician told him that would be good enough, but unfortunate statistics-ignorant lay-folk have no way of knowing whether this is indeed good practice.

In any case he now has 54 patients to compare with a randomly chosen risk-adjusted cohort of 235 patients acting as control (which magically allows him to compare risks in the way he describes). When doing the chart reviews he told his residents to go through the charts looking for a predefined set of variables and to see how often they appear in all the charts. These variables include age, sex, body-mass index, whether counts of sponges and instruments were performed, duration of operation, whether the operation was performed on an emergency basis, etc.

He then performs logistic regression on each of these variables to see whether they were associated with an increased likelihood of retention of a foreign body. However, he sets the level of statistical significance to be P <0.2 as the inclusion criterion for further analysis. This is something that I find somewhat disturbing. I have haunted these forums and looked up methods for variable selection and almost always the answer is that using P-values for variable selection is improper or erroneous. Again I could be wrong and Dr. Gawande might be using it properly in this case for all I know, but I am skeptical. The only people who seem to support its use are those in medicine or epidemiology.

So he then uses 8 variables that have P <0.2, for multiple logistic regression to find that only 3 variables are now significant (operation performed on an emergency basis, unexpected change in operation, and high BMI).

As a clinician, I would love to think that Dr. Gawande's approach to a retrospective analysis is indeed a model of the format. Except for the first part where he decides that approximately 240 subjects are needed as control to give the study sufficient power, everything else is pretty straightforward. However, is this true? Is this study really a model example of the retrospective analysis form as AdamO implies? Or has Dr. Gawande's status within the medical community perhaps allowed a top name journal to give their stamp of approval on what would otherwise be considered a subpar statistical experiment by the statistics community?

score 1 · Answer 1 · answered Apr 21 '13 at 04:56

I find that high profile medical journals in the State (NEJM, Lancet, JAMA) tend to publish retrospective studies that are of high quality from a biostatistician's point of view.

There are a few nuanced aspects to such studies, notably that the interpretation of model parameters are often different and not immediately palpable as relevant association measures. For instance, hazard ratios and odds ratios from Cox and logistic regression models approximate relative risks for outcomes comparing exposure levels, but only when the outcome is rare. In association studies of rare events, like the Wilm's tumor or Non-Hodgkin's lymphoma, retrospective studies work at the population level. The great benefit of retrospective designs is that one can sample individuals based on their case status (e.g. whether they're diseased) and compare them to controls in their prevalence of exposure. I think this seems like a paradox at first blush, but mathematically it works out. We can explore associations in the populations this way.

In general, I critique retrospective designs using all the same tools with which I critique clinical trials: was the outcome prespecified? Were eligible populations well defined? Were matching and confounding variables clearly described and accounted for? Are the major conclusions consistent with the data obtained?

score 1 · Answer 2 · answered Jun 21 '13 at 18:36

Your question has a few parts and I will try to answer them all.

(1) There is no single study design called a 'retrospective study', but rather a multitude of types which use retrospective data. These study designs are also often called 'efficient sampling designs' which emphasizes the idea that you are sampling only a subset of the people you would use in a cohort design (specifically, you are typically sampling a subset of the people who don't develop the outcome during the period of interest, while taking all of the people who do).

(2) You seem to be interested in a case-control study, rather than some of the more complicated designs, but even then, there are multiple types which differ based on sampling strategies as well as the object of inference. The traditional case-control design uses "cumulative incidence" sampling, and, under the rare disease assumption (which AdamO) described, the odds ratio estimates the risk ratio. Other sampling schemes are available which allow the odds ratio to estimate the rate ratio, without the rare disease assumption.

(3) Given that you don't have time to go in depth into a textbook, I'm not sure if this will be too much reading, but let me suggest a series of three comprehensive papers by Wacholder which describe in detail the conduct of case-control studies. These are somewhat old, but should give you a good foundation and lots to think about in terms of designing a study yourself. Wacholder et al. 1992. Selection of controls in case-control studies (parts I, II, and III). Am J Epidemiol, 135(9):1019-50. The abstracts can be found here:

http://www.ncbi.nlm.nih.gov/pubmed/1595688

http://www.ncbi.nlm.nih.gov/pubmed/1595689

http://www.ncbi.nlm.nih.gov/pubmed/1595690

(4) When conducting or evaluating a retrospective study, the key first question you need to ask is "what is the source population from which the cases arose?" The answer to this question tells you several things: (i) what is population the about which you can reasonably make inference/draw conclusions; (ii) what is the population from which the controls should be selected. Second, you should ask "are the controls drawn as a sample from this population?" If the answer to this second question is 'no', then (in most cases) the study is not going to give you valid conclusions.

(5) To answer your first question, unfortunately most 'classic' papers that I have seen used for teaching about retrospective designs are examples of what not to do. Partly this is because the field of observational study design is constantly moving forward and so many older studies use methods that were believed to be sound but are now known to be inappropriate. But, partly this is because many studies repeat the same basic problems, so it is important for students to develop strong skills in identifying these issues so they can avoid them: (i) the authors are not clear on what population gave rise to their cases and as a result use an inappropriate control population; and/or (ii) the authors use an inappropriate method to select confounders.

(6)I know you asked for example papers, but I think it might be a bit more helpful for you to have a critique of the paper you discuss in your edit, since this seems like a relatively good paper. In reference to the source population, Gawande is assessing individuals who underwent a surgical procedure between January 1, 1985, and January 1, 2001, at one of 22 Massachusetts hospitals, and who, if they had suffered a complication of surgery due to provider error, would have had a claim or incident report filed with the Controlled Risk Insurance Company (CRICO). Gawande does a good job of selecting controls that come from this population, since he matches on time period (within 6 months), provider/hospital, and surgery type. These seem reasonable matching factors, since they are likely associated with risk factors and with the outcome (retained instruments/sponges).

Selection of confounders is where Gawande may have problems. Using p-values to select variables for inclusion in a multivariate model is a common practice in biomedical research, and the process of using a p-value cut-off of 0.2 in univariate analyses and then of 0.05 in the multivariate analysis is also commonly used. However, as you have gathered, this method is not necessarily justifiable. Another commonly used, but often inappropriate, method of variable selection is the so-called 10% rule, where you include any variable that changes your effect of interest by 10% or more.

The 'gold standard' for variable selection is subject matter knowledge. If you know that a variable is causing bias (i.e. causing confounding) when excluded from your model, you need to include that variable. But which variables to include can depend on which other variables are already included in your model (a problem for model selection procedures, since this clearly implies that the order of adding variables can affect the variables included). Including inappropriate covariates can make it harder to fit your model, can bias your effect estimate to the null (for example, if the variable is actually an intermediate), or can inflate your effect estimate, and can invalidate the test of the null hypothesis.

A final caveat is that Gawande appears to be doing a more exploratory analysis - he isn't trying to determine whether a single specific exposure causes retained instruments or sponges. Rather he is trying to generate some hypotheses about what variables may need to be explored. In studies such as this, the authors may tolerate some potential bias in the estimates or test of the null, since the goal is to not to draw definitive conclusions. In this case, Gawande's study can serve as a reasonable model of a retrospective design which asks the question "what potential risk factors may be of interest?", even if it has some limitations as a model of a retrospective study which asks the question "does this exposure cause this disease?".

Also, I am surprised that I couldn't come up with good example papers. I'm going to keep looking and will update my answer when I find some, but it seemed like some background would be useful to the oort. — Ellie, Jun 21 '13 at 18:39

score 0 · Answer 3 · answered Apr 21 '13 at 17:00

I agree with everything AdamO said above (I can't upvote anything yet).

In addition, can you be more specific? Is there a particular disease/pathology you are interested in? If you can narrow it down, people may be able to suggest a particular study, or understand a methodology that may be standardly used, and could then point you to teaching examples (i.e., already existing data sets you could start practicing on).

In addition, when you say "retrospective" studies, do you mean "file-drawer" studies (i.e., the authors had an accumulating pile of cases, didn't really have any hypotheses, ran some stats, and said "Gee we found this to be significant?"). Or do you mean something else, like meta-analysis?

In either case, there will be standard strengths and weaknesses associated with typically used methodologies, which will be somewhat well-known (some more than others), and easy to understand and apply. Some will be common-sensical and easily understood by anybody, and others will require pulling out a textbook. Again, the more specific you can be, the more specific an answer you can get.

Parenthetically, you haven't said your level of interest & commitment to this. If you are will to spend some time and effort on this (if you Googled for several hours, you may be very motivated). Learning some basic stats & a stats package may be useful, or may be overkill.

I may be mistaken, but neither of you have provided one example of a good retrospective study! If you feel like you do not have enough information, comment on his question, rather than answer. — Behacad, Apr 21 '13 at 17:37

Examples of "Textbook" Retrospective Studies

3 Answers3

Linked