Statistical Foundation of t-test in experiment with random assignment

Question

Say we want to know whether treatment A is different than treatment B. The most ideal situation will be: I randomly sample people from my target population (mathematically, I generate a set of i.i.d. random variables), then I randomly assign subjects in my sample to treatment A and B such that I ensures that the probability of each subjects being assigned to each treatment is 0.5, then I collect the data, run a $t-$test.

The underlying statistical process (I believe) is the following: conceptually, you can think that your target population has been made an identical copy, and one of them ALL take treatment A, and the other ALL take treatment B. The people in your sample who take treatment A or B can be regarded as the i.i.d. sample from the conceptual population A or B. Then we know that the i.i.d. sample mean follows a $t-$distribution. So this sets the statistical foundation of $t-$test in this case. In this case, the experiment has both internal validity (we can attribute the difference in observation as treatment) and external validity (since our samples are representative of their population).

In practice, in many cases, you post an ads on the website and ask subjects to come to your experiment. In this case, your sample cannot represent the population. So external validity is gone. You can still performs randomization to the sample. However, I am questioning whether it still makes sense to do $t-$test? Because from the above argument, we know that only the sample mean which comes from independent draw from the same population (which means identically distributed) will have $t-$distribution. Here your sample is no way independent identically draw from the distribution (say students distribution, not even mention the human distribution). So in this case I described, is it still valid to use $t-$test to check statistical significance? and is this experiment still at least has internal validity?

score 3 · Accepted Answer · answered Dec 14 '18 at 10:45

To answer this question I first need to set up some notation. Any causal estimand/parameter must be defined in terms of potential outcomes. Consider a setting with two treatment groups; denoted 'treatment' and 'control' for convenience. Let $Y$ be the outcome variable and let $Y_i(0)$ be the potential outcome of individual $i$ if assigned to control, and $Y_i(1)$ be the potential outcome of individual $i$ if assigned to treatment.

The population average treatment effect (PATE) for a infinite population is defined as \begin{equation} PATE= E[Y(1)-Y(0)]. \end{equation} In a similar fashion the sample average treatment effect (SATE) can be defined. Let $n$ be the sample size, SATE is defined as \begin{equation} SATE= \frac{1}{n}\sum_{i=1}^n Y_i(1)-Y_i(0) \end{equation} Note that the sampling does not come into the definition of the SATE.

The crux of causal inference, regardless of what estimand you are interested in, is that Y(1) and Y(0) are never both observed for any individual at the same time. However, it can be shown (e.g. Imbens and Rubin, 2015) that under complete randomized treatment assignment the sample mean difference \begin{equation} \hat{\tau}=\frac{1}{n_1}\sum_{i \in treated}y_i - \frac{1}{n_o}\sum_{i \in control}y_i, \end{equation} where $n_1$ and $n_0$ are the sizes of the treatment and control group ($n_0+n_1=n$), is a unbiased estimator of the SATE. Moreover, if in addition the sample is a random sample from the population (finite or infinite), $\hat{\tau}$ is an unbiased estimator of the PATE.

Now, in your question you are correctly stating that we can still make inference regarding the `treatment effect' even without a random sample. That is, we can make inference concerning SATE. In fact, for SATE, exact inference (Fisher, 1935) can be used as the only source of randomness in the estimate comes from the treatment assignment mechanism that we have control over (as we are designing/randomizing the experiment). I think exact tests, together with novel design strategies such as Rerandomization (Morgan & Rubin, 2012; 2015), should be favored in these cases due to their lack of assumptions.

The answer to your question: The t-test is derived to include both the variation from random sampling and random treatment allocation. This implies that the t-test will always be conservative for inference to SATE as it will include variation that `is not there'. That is, using the two-sample t-test not assuming equal variance (also called Welch's test) will give valid, but conservative, inference concerning SATE, assuming that the other conditions for the t-test are fulfilled.

For details and proofs, I really recommend Imbens & Rubin (2015).

References:

R A, Fisher, The Design of experiments, 1935

K, Morgan and D, Rubin, Rerandomization to imporve balance in experiments, 2012

K, Morgan and D, Rubin, Rerandomization to Balance Tiers of Covariates, 2012

G, Imbens and D, Rubin, Causal Inference in Statistics, Social, and Biomedical Sciences, 2015

score 1 · Answer 2 · answered Aug 03 '16 at 17:32

1

The use of the t-test as opposed to a standard normal (z table) is to deal uncertainty, due to small sample size. This means that there is an implicit assumption of normality (or near normality). If your sample is reasonably large, and you can us z tables, instead of t tables.

This does not in any way address sample bias, which is the problem you are asking about. Regardless of whether your sample is biased or not, if you can assume a roughly normal distribution, you should use the t-test for small sample sizes.

If your sample is biased, you will have to deal with finding a way to quantify the bias. Of course, finding a way to collect an unbiased sample would be better, if it is possible.

answered Aug 03 '16 at 17:32

John Yetter

856
5
10

Correct me if I am wrong, in the case I described, even if the external validity of the experiment is gone (i.e., my sample is terribly biased and cannot represent the population at all), still by randomization, I can keep the internal validity, i.e., I still can claim that treatment A is different than B for this specific sample. For running t-test, I at least have to make sure that the distribution within group A and group B is normal, otherwise, I couldn't use t-test. Or if I have large sample, then by CLT, I can just use z tables – KevinKim Aug 03 '16 at 17:39
1

You can make a statement about the sample, and A and B being statistically different, or not. Depending upon how your sample is biased, this may reflect reality, or not. Certainly, if you are using a t-test, you are assuming a normal distribution. My main point is that the t-test versus other normal tests largely deals with uncertainty due to sample size, and in no ways depends upon sample bias, as far as I know. – John Yetter Aug 03 '16 at 17:43
Even if I only want to discuss the internal validity, I still think it is problematic to run t or normal test even if I have large sample. The reason is often in each group you have half of the sample, then literally speaking, it is draw without replacement. Then the sample mean, in order to have an asymptotic normal distribution, must adjusted by a factor called FPC (see http://stats.stackexchange.com/questions/5158/explanation-of-finite-correction-factor). So t-test or normal test, without FPC in the experiment setting, is incorrect. – KevinKim Aug 14 '16 at 18:53
@KevinKim: That's yet another topic (samples from very small populations), where there are indeed issues with using standard methods that are intended for huge populations (if your population is all potential users of a webpage for the general public, then stop worrying about this). What you ask about otherwise really not the main problem here and/or you are mixing together sort of unrelated problems to worry about what is probably not the biggest problem in the use case you quote. – Björn Dec 14 '18 at 10:56

Statistical Foundation of t-test in experiment with random assignment

2 Answers2