I am a physician and have never had any formal training in statistics/biostatistics. However every time I come across a journal article, specifically on retrospective studies, I cannot help but feel a little suspicious that (a) the authors are trying to pull a fast one on me and/or (b) they are like myself and don't really understand what they are doing as well.
So I was wondering if there are any retrospective studies, hopefully in medicine, that are considered to be "classics" or are stellar examples of logical thinking and sound applications of statistical principles.
I know that the knee-jerk response might be to tell me to read textbook A or B, but unfortunately I don't have that kind of luxury at this point. I also do not have the luxury of quick access to a biostatistician to ask these kinds of questions at my current institution.
-- EDIT --
In response to some of the answers below, all of which are much appreciated, let me add the following:
I am a surgeon, so I am interested in diseases pertaining to the field of surgery. AdamO suggested that I look at the NEJM, JAMA, or the Lancet, which I did. One of the studies that I found interesting but am somewhat skeptical of is: "Risk Factors for Retained Instruments and Sponges after Surgery" by Atul A. Gawande et al, published in the NEJM, January 2003.
Now Dr. Gawande is something of a rock-star in general surgery, and this is one of the reasons why I chose to look at his paper after getting the suggestion from AdamO. His paper is published in the NEJM, so he must be doing something good when designing his study.
He is trying to see what are risk factors for surgeons accidentally leaving things inside patients. This is a (thankfully) very rare occurrence: scouring through a Massachusetts malpractice insurance company's records from 1985 to 2001, he found only 54 cases.
However, his design leaves me somewhat confused and unhappy. For instance he needs to compare 54 cases with a risk-adjusted cohort that did not have retained objects, which he will treat as control. He writes "Given an estimated 60 cases available for review, we determined that four controls for each case would give the study sufficient power to detect a risk factor present in 30 percent of patients that produced a doubling of the likelihood that a foreign body would be left behind." Most likely some biostatistician told him that would be good enough, but unfortunate statistics-ignorant lay-folk have no way of knowing whether this is indeed good practice.
In any case he now has 54 patients to compare with a randomly chosen risk-adjusted cohort of 235 patients acting as control (which magically allows him to compare risks in the way he describes). When doing the chart reviews he told his residents to go through the charts looking for a predefined set of variables and to see how often they appear in all the charts. These variables include age, sex, body-mass index, whether counts of sponges and instruments were performed, duration of operation, whether the operation was performed on an emergency basis, etc.
He then performs logistic regression on each of these variables to see whether they were associated with an increased likelihood of retention of a foreign body. However, he sets the level of statistical significance to be P <0.2 as the inclusion criterion for further analysis. This is something that I find somewhat disturbing. I have haunted these forums and looked up methods for variable selection and almost always the answer is that using P-values for variable selection is improper or erroneous. Again I could be wrong and Dr. Gawande might be using it properly in this case for all I know, but I am skeptical. The only people who seem to support its use are those in medicine or epidemiology.
So he then uses 8 variables that have P <0.2, for multiple logistic regression to find that only 3 variables are now significant (operation performed on an emergency basis, unexpected change in operation, and high BMI).
As a clinician, I would love to think that Dr. Gawande's approach to a retrospective analysis is indeed a model of the format. Except for the first part where he decides that approximately 240 subjects are needed as control to give the study sufficient power, everything else is pretty straightforward. However, is this true? Is this study really a model example of the retrospective analysis form as AdamO implies? Or has Dr. Gawande's status within the medical community perhaps allowed a top name journal to give their stamp of approval on what would otherwise be considered a subpar statistical experiment by the statistics community?