2

I have a list of 20 experiments, each have a certain numerical value as a result. I would like to run many more experiments of this type (for different conditions, and so on), but 20 experiments are too expensive.

What's the best way to estimate the optimal number of experiments, for which additional experiments change the mean only slightly?

I have tried randomly sampling a subset (2,3,...18 and so on) of the experiments, calculating the standard deviation for each (for a number of iterations), trying to see from which number of experiments the std change only slightly, this is what I got after averaging all the STD per number of samples: (y-axis is the mean STD, x-axis is the number of experiment sampled from the total 20)

y-axis is the mean STD, x-axis is the number of plates sampled

However, I'm not sure this is the optimal way to do such a thing.

Protostome
  • 121
  • 5
  • What do you mean by an experiment? Are there multiple measurements involved? – mkt Jun 19 '19 at 11:17
  • Example of an experiment - Estimating the expression level of a gene (how many copies of a gene are present in a cell). Measurements are noisy, so we conduct the experiment multiple times. – Protostome Jun 19 '19 at 11:26
  • 1
    So each 'experiment' here has a single measurement? – mkt Jun 19 '19 at 11:36
  • @mkt That is correct – Protostome Jun 20 '19 at 06:28
  • 1
    This is useful to know; I would say that calling these 'measurements' rather than experiments is likely to make the question clearer to most readers. One more thing: your initial question is about how many measurements are needed for the *mean* to not change much with additional data, but you subsequently talk about the stability of the *standard deviation*. It's still not entirely clear what you are interested in. – mkt Jun 20 '19 at 08:06
  • However, you may find it useful to look up the `power-analysis` tag: https://stats.stackexchange.com/questions/tagged/power-analysis?sort=votes&pageSize=50. This question, for example: https://stats.stackexchange.com/q/21237/121522 – mkt Jun 20 '19 at 08:07
  • @mkt Thanks! I use the stability of the std to estimate how "close" i am to the true distribution, of course this should also take the mean into account. Given that I have some data about previous measurements already, how would you propose I should conduct power analysis for future such measurements? – Protostome Jun 20 '19 at 08:40

0 Answers0