Say 2% of the population dies before a drug is consumed and 1% of the population dies after the drug is consumed. Then, a non-mathematician will say yes the drug was effective. What better can we say instead?
If you have the entire population, then you're good. You for sure that the percentage of people dying was cut in half after the drug was introduced.
But, let's say you don't have the population. Let's say you have 1000 people with Condition X before a drug is introduced. Two percent of these unfortunately pass away. Then a drug is introduced, and you look at 1000 more people after this. Only 1% of this sample passes away.
You could analyze this, for example, with a logistic regression. Each person would be their own row in the data, there would be a variable indicating if they got the condition before or after the drug was introduced. Your regression would look like y ~ drug
where y
indicates if they passed away (1) or not (0).
However, this would not be a causal interpretation. It could have been that a strain of the bacteria Condition X got weaker as time went on, so it wasn't due to the medicine at all.
This gets to the next question.
How should I design this same experiment in a better way to say something statistically? My total population can be assumed to remain the same (say it is very large, so despite 2% dying it doesn't change much). But I cannot repeat this experiment so I cannot design a test around it. I don't even know how I can develop a CI if I cannot repeat this experiment.
Again, if you have the entire population, then you're good. You don't need statistics, since statistics are just used to infer from a sample to a population.
But what I would suggest is taking a sample of people with some Condition X, randomly assign them to either get a drug or a placebo, and then test the rate at which patients passed away with the same regression model as described above. However, now, due to random assignment, you could make a causal claim.
On "repetition"
I want to address the idea of "cannot repeat this experiment." Frequentist statistics, which confidence intervals come from, pose a pretty theoretical question when it comes to these confidence intervals: "Imagine I ran this exact same experiment an arbitrarily large amount of times. What would the death rate be in 95% of those experiments?" But we only have one sample. So we rely on assumptions in our models and methods that lets us make these long-run, frequentist interpretations. So you don't need to repeat the experiment a large number of times. You can rely on statistics to get an idea of uncertainty. And you can decide if you're willing to tolerate that level of uncertainty and call the 1% meaningfully different from the 2%.