How to measure if a mean is stable

Question

Background

I am a physics major, however I am currently interning at a psychiatry/neuro-imaging laboratory. The primary area of research in my lab is diffusion tensor imaging (DTI). A lot of the studies that are conducted here are group comparisons between "normal" controls, and patients whose brains are to some degree "abnormal" (Schizophrenia, Traumatic Brain Injury, etc.). A common procedure is to segment the brains into several regions of interest (ROI) and then calculate some characteristic numbers for each ROI, most importantly the Fractional Anisotropy (FA). To put it very simple, every region of the brain is is assigned a number between 0 and 1. Since the FA of an ROI is calculated as the mean of all voxels in that ROI, it can be assumed that the FA value for a specific ROI is normally distributed among all healthy controls (Central Limit Theorem). What happens next is that for each ROI, the mean FA is calculated from all healthy controls, i.e. for every ROI in the brain we find a number that is the "standard value" for this ROI. It is then investigated how much the patients with "abnormal" brains differ from this "standard value".

Question/Problem

An important question here that has been given to me as a summer project is how many healthy controls it takes to make this mean (i.e. the "standard value") stable. "Stable" means that the mean of $N$ and $N+1$ controls does "not differ much", e.g. adding another control contributes only negligible new information. I know that this is a very vague formulation, and this is also the reason why I have come here:

What do you think would be a suitable way of characterizing if the mean of $N$ and $N+1$ controls does "not differ much"?

My supervisor has described to me what she would ideally like to have at the end of my internship here: A program into which she can put the data of all the controls she has so far, then the program does its magic, and in the end it tells her "You have enough controls, your mean is stable", or "Your mean is not stable, and you need at least $X$ more controls to make it stable."

I realize this probably sounds pretty wishy-washy and not very well-defined to you, but it's the same for me :-/ I have already started to do some research myself, and tried a couple of things, but nothing so far seemed particularly promising. Therefore, I would be very grateful for any kind of advice how to tackle this problem. Algorithms, references to literature, wild ideas, ... anything that gives me a starting point for further research would be greatly appreciated!

Thanks in advance for your efforts!

It might be useful to consider the question "for what purpose does the mean need to be stable?". Are you able to throw any light on *why* the requirement exists at all? What's the mean being used for that needs it to be stable? (in whatever sense that is meant) — Glen_b, Aug 31 '14 at 23:07
You may find ideas about the accuracy in parameter estimation approach interesting. I discuss it here: [How to report general precision in estimating correlations within a context of justifying sample size?](http://stats.stackexchange.com/a/30287/7290) — gung - Reinstate Monica, Aug 31 '14 at 23:57
@Glen_b As I described above, the patients whose brains are believed to be "abnormal" are compared to the mean of the healthy people, so that is what that mean is used for. The main reason for this is that DTI (or MRI in general) can't be calibrated and the (absolute) values that you receive depend heavily on the scanner that you use. I think "stable mean" is therefore pretty much equivalent to "close to the 'true value'", i.e. the value we would get if we could use the entire population of the earth as a control group. Did that help? — der_herr_g, Sep 01 '14 at 06:22
Not really; why do you need the sample mean to be close to the population mean to compare two groups? — Glen_b, Sep 01 '14 at 09:51
To clarify: you can account for the variation of the healthy people around the population mean, that's what hypothesis tests do. If it was an issue, it would equally be an issue for the other group, would it not? — Glen_b, Sep 01 '14 at 11:45
I believe the main motivation for all this is that in the medium run, they want to move on from group comparisons (where you can indeed use hypothesis tests to check if the mean of the two groups is significantly different) to comparing a single individual against this mean of the healthy group. This might allow to use DTI for diagnostic purposes: E.g., somebody has had a car accident, and now they want to see if he has suffered any traumata that do not show up in conventional MRI/CT, so what they do is they compare his FA values to the mean to identify abnormal ROIs. — der_herr_g, Sep 01 '14 at 15:03

score 1 · Answer 1 · answered Aug 31 '14 at 21:28

Means don't switch from unstable to stable.

Given some amount of variation in the population (which itself can be estimated, of course), you can compute the standard error for a mean. The expected variation from $N$ to $N+1$ will be a function of that, but note that a wildly different next person will move the mean more than a next person whose values are very typical.

That standard error of the mean decreases as a smooth function of sample size.

In some situations it might be better to work with a margin of error rather than the standard error (an interval half-width), but the two will be closely related. It might also be better to work with relative error rather than absolute (a percentage margin of error, perhaps).

Either way, someone in the domain will still need to say what "stable" is in those terms; it's really in-domain knowledge, not statistics, that determines that. But they may be more willing to say "we regard a 2% margin of error as stable" than to plump for a raw standard error or a raw $N$.

How to measure if a mean is stable

Background

Question/Problem

1 Answers1