What if your randomly formed groups are clearly not similar?

Question

What if, before you begin the data collection for an experiment, you randomly divide your subject pool into two (or more) groups. Before implementing the experimental manipulation you notice the groups are clearly different on one or more variables of potential import. For example, the two (or more) groups have different proportions of subjects by gender or age or educational level, or job experience, etc. What is a reasonable course of action in such a situation? What are the dangers of discarding the original random division of the subject pool and dividing the pool again? For example, are the inferential statistics that you might calculate based on the second set of groups in any way inappropriate due to the discarded first set of groups? For example, if we subscribe to discarding the first division of the subject pool into groups, are we changing the sampling distribution that our statistical test is based on? If so, are we making it easier or harder to find statistical significance? Are the possible dangers involved in repeating the division of subjects greater than the obvious danger of confounding due to group differences in educational level, say?

To make this question more concrete, assume for the sake of this discussion that the topic of the research is teaching method (and we have two teaching methods) and the difference noted between the two groups of subjects is level of formal education, with one group containing proportionally more people with highest educational attainment of high school level or less and the other group containing more people with some college or a college degree. Assume that we are training military recruits in a job that does not exist in the civilian world, so everyone entering that specialty has to learn the job from scratch. Assume, further, that the between group imbalance in previous educational attainment is statistically significant.

Parenthetically note that this question is similar to What if your random sample is clearly not representative?. In a comment there, @stask perceptively noticed that I am a researcher not a surveyor and commented that I might have gotten more relevant answers had I tagged my question differently, including "experiment design" rather than "sampling." (It seems the sampling tag attracts people working with surveys rather than experiments). So the above is basically the same question, in an experimental context.

score 3 · Answer 1 · edited Aug 20 '12 at 19:34

3

If you just do a new randomization of similar type to the previous one (and allow yourself to keep randomizing until you like the balance) then it can be argued that the randomization is not really random.

However, if you are concerned about the lack of balance in the 1st randomization, then you probably should not be doing a completely randomized design in the first place. A randomized block or matched pairs design would make more sense. 1st divide the subjects into similar groups based on the things that you are most concerned about, prior education in your example, then do a randomization within each group/block. You will need to use a different analysis technique (randomized block, or mixed effects models instead of one-way anova or t-tests). If you cannot block on everything of interest then you should use the other techniques of adjusting for covariates that you do not block on.

edited Aug 20 '12 at 19:34

Michael R. Chernick

39,640
28
74
143

answered Aug 20 '12 at 16:18

Greg Snow

46,563
2
90
159

Yes, clearly you can argue that the second randomization is not fully random. Blocking is a good idea, but what if that was not done and the two independent groups formed just look uneven. Are there any ways to think about and quantify the effects of doing the formation of groups a second time? Is tossing the original groups out and doing a randomized blocks design better than just forming two independent groups a second time? If so, why/how? – Joel W. Aug 20 '12 at 16:25
If the imbalance is enough to concern you then that says that you should have blocked to begin with. Most statistical techniques assume that your data is iid or at least exchangeable, if it is possible to get an imbalance large enough to concern you then that says your data is not iid or exchangeable. Proper blocking will give you conditional iid or conditional exchangeability (and the randomized block analysis is fine with the conditional). Proper blocking will also reduce variation and give you more power and better estimates of effect sizes. – Greg Snow Aug 20 '12 at 16:42
In some fields, like psychology, there are so many potentially relevant variables that it may be impractical or even impossible to block on all the variables. That is why measurements in the social sciences have so much error variance. – Joel W. Aug 20 '12 at 16:59
2

So pick a few. Some is better than nothing. – Michael R. Chernick Aug 20 '12 at 19:35
What if the two independent groups are unbalanced with respect to a variable that was not blocked on? – Joel W. Aug 20 '12 at 20:04
1

Then measure them and look to see if they need to be adjusted for. – Greg Snow Aug 21 '12 at 16:11
The hypothetical example in the question posits a large, statistically significant difference on a relevant variable: educational attainment. There may just not be enough data to support a statistical adjustment for level of education within each of the two training methods. Even if enough data was available, such adjustments are based on assumptions. Would it be better to reconstitute the two experimental groups and avoid such statistical adjustments? – Joel W. Aug 21 '12 at 18:12
If you are concerned about the imbalance then you should block on those variables that are not balanced. If you just keep rerandomizing until the balance does not worry you then analyze the data like there was only a single randomization then you are doing the wrong thing (most of your assumptions will be violated and who knows what the effect will be). There are tools that help with matching subjects when there are many things that you want to adjust for and cannot find simple blocks. – Greg Snow Aug 22 '12 at 18:34

score 2 · Answer 2 · answered Aug 20 '12 at 01:44

2

I do not think sampling should be adjusted because of chance imbalances. Adjusting creates complications that can be worse than any problem you think you might solve. If in the end you have covariate imbalances there are ways to adjust for them. See this book by Vance Berger for example.

Selection Bias and Covariate Imbalances In Randomized Clinical Trials

answered Aug 20 '12 at 01:44

Michael R. Chernick

39,640
28
74
143

A quick look at the Amazon preview of the interesting Berger text you cite suggests that treatment effects may be due to selection bias and, therefore, claims for treatment superiority can be challenged if there is selection bias (e.g., page 39, paragraph 3). What part of the text were you referring to when you said "complications that can be worse than any problem you think you might solve"? – Joel W. Aug 20 '12 at 12:50
Assume that I conclude that the educational method used with the higher educational attainment group is more effective than the educational method used with the other group. How do I respond to a (non-statistician) manager who says, could not the apparent superiority of the educational method used with the higher educational attainment group be due instead to the higher educational attainment of the people in the group or correlates of that, such as the people in that group being brighter? – Joel W. Aug 20 '12 at 13:02
What I was suggesting is that Berger provides methods for making adjustments for covariate imbalance. The other comment wasn't referring to Berger's text. – Michael R. Chernick Aug 20 '12 at 13:43
You say, "Adjusting creates complications that can be worse than any problem you think you might solve." But I am asking a different question: whether to toss out the first random division of the subject pool into groups in favor of a second such division. What possible complications might that second division lead to? – Joel W. Aug 20 '12 at 13:50
@JoelW. If you start out with a random sample and manipulate it like that what you are left with is no longer random. So there could be bias in making inference that you can't quantify. – Michael R. Chernick Aug 20 '12 at 14:00
I was hoping for ways to think about and even quantify just that possible bias. For example, would the error variance be artificially low, inflating the alpha level? Or would the error variance be artificially high, due to a good mix of people in each group, thereby reducing the power? Is doing the random division into groups twice a sizable risk, or is the risk minimal unless the random division is done many times? Has anyone done Monte Carlo research on this topic? – Joel W. Aug 20 '12 at 17:07

Charlie · Answer 3 · 2012-08-23T15:16:19.033

In my opinion, drawing and redrawing a sample in response to not looking "random enough" doesn't generate any bias so long as

You aren't making that decision based upon your outcome of interest and
You condition on your observables.

We know that bias is introduced by selection on unobservables. As long as you perform selection based upon observables and those observables are conditioned on in the analysis, you're okay.

Is this still random? I say yes, it is a random sample, conditional on the observables. This is what matters. We need to be careful about how we are defining "random" and when we are precise, we see that randomization is still here.

How might you condition on them? Linear models are a standard method. Matching is a nice non-parametric procedure, but requires a bit more thought that you might imagine (I'm biased to be fond of genetic matching algorithms).

I would note that the conditioning process smooths out or takes into account (depending upon how you want to look at it) the imbalances present. So you don't need to redraw your sample. The only time that you might need to do this is if you want to do matching and you don't have common support between the two groups (you have some college graduates in one group and none in the other, for example).

Blocking or stratifying is really about variance reduction (unless your interested in non-homogeneous responses along some dimension), not about bias.

Since you still have randomization, no adjustments need to be made in your testing procedures. (Though, if you're doing matching, be sure to use variances estimators designed for matching; see Abadie and Imbens (2006))

score 0 · Answer 4 · answered Aug 20 '12 at 20:23

0

An experiment requires a control, but it doesn't need to be structured as "end result of one randomly assigned group" vs. "end result of another randomly assigned group". You are concerned that the composition of the groups is unbalanced on one particular attribute that you believe has an outsized influence on the end result; perhaps a different structure would make this irrelevant. The experiment may be set up as a before-and-after for each teaching method group, or it can be a regression where one independent variable is the teaching method. If you have really strong reasons to believe that this one particular attribute outweighs all others, you could even take your population of participants and separate them into groups according to education level, and then within each group randomly assign participants to the control and test groups.

A risk is that while you've picked out one key driver of different outcomes, there may be others lurking in there you haven't detected, and which you might be subverting.

answered Aug 20 '12 at 20:23

Jonathan

1,283
8
15

In some fields, like psychology, there are so many potentially relevant variables that it may be impractical or even impossible to block on all the variables. But it may be that we notice a troubling difference after we randomly assign subjects to groups. That is the situation I ask about. – Joel W. Aug 21 '12 at 18:18
Agreed - that is my caveat in the last paragraph. There are some situations in which you have prior knowledge about which variables have the most impact on the final outcome, but it sounds like you're saying that in this hypothetical, you do not. That would affect the "blocking" approach, but not a different method of comparing test to control, such as before-and-after, or through regression. – Jonathan Aug 23 '12 at 05:18
So there are two possible risks: (1) there "may be others lurking" versus (2) a clear, unintended difference between independent experimental groups. My question is, how to compare the risks. It seems to me that the risk of using clearly different groups in an experiment is greater than the possibility of a potential lurking risk. My question is, is it possible to compare such risks in a quantitative fashion, or at least clearly state the risks in a qualitative fashion? (The lurking risk you mention is a rudimentary step in that direction.) – Joel W. Aug 23 '12 at 12:29
Are we assuming that the test needs to be a strict comparison of one final metric of the control group, compared to one final metric of the test group? While I appreciate the theoretical value of your question, in practice we may be able to make the problem go away with a different method of comparison :) – Jonathan Aug 23 '12 at 21:33
What do you mean by "a different method of comparison"? In the question posed, I envisioned two groups, trained with two different teaching methods, evaluated after training with the same achievement test designed to measure mastery of the material. The performance of the two groups on this achievement test would be compared, perhaps with a t test. – Joel W. Aug 23 '12 at 22:08
@Joel I'm not finding this comment conversation to be very productive. I provided some thoughts on different methods of comparison in my answer: before-and-after (aka delta of deltas) and regression. I am trying to answer the question "What is a reasonable course of action in such a situation?" to the best of my limited abilities, but I'm seeing a trend on this question of you nudging me and the other answer-ers in a different direction than your main question. – Jonathan Aug 24 '12 at 04:01
It sounds like you only want quantitative criticism/support for re-running random groups until you like the groups you get. Unfortunately I'm not sophisticated enough to tell you what the results could be, only that tweaking something until it "looks right" using hidden criteria (you appear to care about more than just education level but cannot say what) is usually not the best solution, and I suggest some others. – Jonathan Aug 24 '12 at 04:03
You write, "is usually not the best solution" but I want to understand the reasons you say that, and the specific risks. (There are different shortcomings and risks to the approaches you suggest.) As a result of the lack of compelling responses to my posting, despite comments by highly qualified people, I am beginning to think that the risks due to re-sampling are minimal and that re-sampling is the best of several non-ideal approaches, should the scenario I presented actually occur. – Joel W. Aug 24 '12 at 17:32
@Joel Why do you think re-sampling is the "best" so far? How are you evaluating this vs. the other suggestions? Since other answer-ers may be interested I'd suggest editing your question to add a section. – Jonathan Aug 24 '12 at 18:38

What if your randomly formed groups are clearly not similar?

4 Answers4