How to test for and deal with regression toward the mean?

Question

I am working with a large dataset of behavioral data that I am treating (post-hoc) as a time-series experimental design to look for reliable change in a single dependent variable as a result of a treatment. The data comes from user's interaction with a website over 10 years. There is an overall improvement from time 1 to time 2, p<.001. But there is a regression to the mean effect (also see here) such that those with low DV at time 1 increases at time 2, and those with high DV at time 1 decreases at time 2. (This can be seen clearly on graphs.) I don't know how to proceed with the analysis. Can I quantify the regression to the mean effect and from that determine how it is that the treatment effect exceeds the regression effect?

Here are some additional details about the study:

the data for the study come from a support group website where users write about life problems. volunteer counsellors read the entries and respond to the users with support and advice. there is 10 years worth of data; n=~200,000.
my research is a natural experiment because it works with the website data that was not collected with research in mind. biggest issue with that: no control group.
the volunteer counsellors also tag (privately) the user's written entry with topic, attribute, and severity labels. I conducted a survey of the counsellors asking them to rate the relative severity of these tags (e.g. 'depression-panic'~0, 'school-worry'~2, 'relationships-happy'~5). The survey results and the tags applied to each writing entry were used to derive a simple proxy for the user's state at time of writing. this proxy was normalized across the sample, has a quasi-normal distribution, and it is treated as the IV.
the treatment is simply use of the site (writing about life problems & receiving social support), so DV from writing entry 1 to entry 2. the main effect is that IV does increase overall from entry 1 to 2, but as described there is a regression to the mean effect.
after establishing a main effect, I am interested in looking into a variety of interacting variables: user's language choices, details of website interaction, timing of counsellor response, etc.

If you have a regression towards the mean, are you comfortable that the measured behaviour was a reliable indicator of overall behaviour at the time? Statistical analysis requires valid measures, which means, for eg, the measures are reliable (measured with relatively small error) and have construct validity (measure what you think they measure). Can you provide more detail: what was measured, how was it measured, what was the treatment, how people were selected into treatment, etc? I don't understand why high-behaviour and low-behaviour users underwent the same treatment. — Michelle, Jan 24 '12 at 06:13
It's not clear what the treatment is here, and in particular when and why it was administered. Could you tell us please? To compare mean IV levels at times and 2, you could just compute the average differences in IV, comparing time 2 to time 1. Assuming the IVs are independent for each person, this gives a measure of improvement; test its significance with a one sample t-test, if you like. But this analysis doesn't say anything about treatment, which is what you're interested in. — guest, Jan 24 '12 at 06:57
@guest asked for some additional details, and I concur. What did you do? How much data do you have? What variables have you got? What are your research questions and hypotheses? — Peter Flom, Jan 24 '12 at 11:50
Don't you mean DV (dependent variable) not IV (independent variable) here when talking about user state? You want to quantify the effect of the treatment/IV (use of the site) on the outcome/DV (user state). Independent variables don't change as a result of treatment. — Anne Z., Jan 24 '12 at 22:23

Michelle · Answer 1 · 2012-01-25T17:50:13.220

Update: if you have a true regression to the mean effect, because both it and treatment effects co-occur over time and have the same directionality for people needing treatment, the regression to the mean is confounded with treatment, and so you will not be able to estimate the "true" treatment effect.

This is an interesting set of data, and I think you can do some analyses with it, however you will not be able to treat the method used to generate the data as an experiment. I think you have what is outlined on Wikipedia as a natural experiment and, while useful, these types of studies have some issues not found in controlled experiments. In particular, natural experiments suffer from a lack of control over independent variables, so cause-and-effect relationships may be impossible to identify, although it is still possible to draw conclusions about correlations.

In your case, I would be worried about confounding variables. This is a list of possible factors that could influence the results:

Possibly your largest confound is that you don't know what else is going on in users' lives away from the website. On the basis of what they write on the website, one user may realise how bad their situation is, they may draw on resources around them (family, friends) for support and therefore the help is not limited to that received on the website. Another user, perhaps due to their life issues, may be alienated from family and friends and the website is all the support they have. We may expect that the time-to-positive-outcome will be different for these two users, but we can't distinguish between them.
I'm assuming that the website users accessed the website when they wanted to (which is great for them) but means that the results you have for their problems may not be reflective of the number and severity of their life issues, because I assume they didn't access the site regularly (unlike face-to-face counselling appointments which tend to be scheduled regularly).
The level of detail in their writing will be reflective of their written style, and is not likely to be equivalent to what they would express in a face-to-face counselling session. There are also no non-verbal cues which a face-to-face counsellor would also use to help assess the state of their client. Were the changes over time more pronounced in users who wrote less and had less tags applied to their content?
If there were a number of lower-score and high-score tags in the same post (e.g. someone is having problems with study and they're in a happy relationship), how was the proxy affected by this, for example was a simple average score take across all tag scores for each post? This could be affecting your scores if there is a particular very negative issue that the person is facing, but much of what else they mention is positive. In a face-to-face setting, the counsellor can focus on the negative and find out, e.g. find out why the person is so depressed even though much of their life seems to be going well, but in the website situation you only have what they write. So it is possible that the way users have written their posts means that taking an overall proxy may not work too well.
If the website is for users with life problems, I'm not sure why you wish to include users who scored as being very (happy? successful?) in their first post. These people do not seem to be the target audience for the website and I'm not sure of why you would want to include them in the same group as people who had issues. For example, the happy(?) people do not seem to need treatment, so there is no reason I can think of why the website intervention would be suitable for them. I'm not sure if users were assigned to the website as a treatment after, for example, seeing a counsellor. If that was the case, I would wonder why people who were upset enough to see a counsellor would then do a very positive post on a website designed to help them improve their mental state. Assuming there was this pre-counselling stage, maybe all they needed was that one counselling appointment. Regardless, I think this is quite a different group to the ones that gave initial posts that showed life issues, and for the moment I would omit them as they seem to be a "sampling error". Normally when assessing treatment effects, we only select people who appear to need treatment (e.g. we don't include happy contented people in trials of antidepressants).
There may be some social desirability bias in the user posts.
Have you undertaken any inter-rater reliability testing with the tags? If not, could some of the issues with scoring be related to bias with some tags? In particular, there could be some quality issues when the counsellor has just started and is learning how to tag posts, just like there are quality issues when any of us learn something new. Also, did some counsellors tend to place more tags, and did some tend to place few tags? Your analysis requires tag consistency across all the posts.

These are just suggestions based on your post, and I could well have misunderstood some of your study, or made some incorrect assumptions. I think that the factors you mention at the end of your post - user's language choices, details of website interaction, timing of counsellor response - are all very important.

Best wishes with your study.

Thank you for this thorough list of suggestions. I have looked into a lot of these factors before, they are all good points. But none of these points speaks to my question about regression to the mean effects, sometimes called a "regression artifact". — ted.strauss, Jan 25 '12 at 15:51

score 0 · Answer 2 · answered Jun 14 '16 at 21:32

I'm not in any way an authority on statistics, but might I suggest using other studies to get an estimate of the degree of regression to the mean you have in yours? In an ideal world, you would estimate the degree of regression to the mean using a control group, but since you don't have a control group, maybe you need to jimmy-rig one from the literature.

Somewhere in the psychology literature, someone must have said something about the degree of regression to the mean that can be expected in the life happiness of people not too dissimilar to yours (maybe college students who visit counseling services). If a student whose happiness is at the 10th percentile can be expected to go back to regress to the 20th percentile within 6 months, just via regression to the mean, maybe you could make a similar assumption about people in your own data.

I emphasize that this method would (and should) reduce the credibility of your analysis, since the hypothetical college students that you would use for a comparison might differ in very important ways from the people who use your online forum, but it might be the best way of dealing with a bad situation.

(The inspiration for this suggestion comes from my reading of the Rubin causal model, which gives a flexible way to think about observational research: identify a counterfactual by way of clever assumptions, caveating them as you go.)

I actually looked for quite a long time to find studies that offer these kinds of reference examples, and it appears that it's not something that has been investigated directly (at least in social sciences). Considering that psychology has yet to get over it's p<0.05 addiction, it's not very surprising that this hasn't been addressed. — ted.strauss, Jun 14 '16 at 21:55

How to test for and deal with regression toward the mean?

2 Answers2