0

I have two samples, and in each sample I measured fluorescence intensity from 10k cells. Unfortunately the instrument doesn't return raw data, just histograms and summary statistics (mean, st.d., and sample size). From the histograms, the data looks roughly exponentially distributed. I want to test for a difference in means using only the histograms, means, st.ds, and sample sizes.

Option 1: If the data was roughly normal, I'd plug the summary stats into a t-test, e.g. tsum.test. Unfortunately the data looks quite skewed, so I don't think that's appropriate: enter image description here

Option 2: I also tried log-transforming the summary statistics, but unfortunately I'm not sure how to calculate sd(logx) from sd(x).

What's the best way forward?

R Greg Stacey
  • 2,202
  • 2
  • 15
  • 30
  • Do you have the sample sizes? – Dave Jun 28 '21 at 19:18
  • Yes, I have the sample sizes. Just edited the question to say so. – R Greg Stacey Jun 28 '21 at 19:20
  • 1
    Honestly, a t-test is probably going to get you the right answer 80% of the time and take only 10% of the effort of finding the correct test—and if it's one you can use with just summary statistics. The general linear model is pretty robust. – Mark White Jun 28 '21 at 19:34
  • 2
    "Roughly exponentially distributed" might be a misleading characterization. The relatively large SD indicates a tendency to exhibit high outliers--precisely the kind of behavior that can produce misleading t-test p-values. Another potential problem is that if this histogram summarizes a spectrum, it might be combining strongly correlated values. If you don't account for that, any differences will seem more significant than they really are. Thus, the first step in a good way forward is to explain what these data actually measure. – whuber Jun 28 '21 at 19:50
  • Thanks @JarleTufto. Yes, that link answers my question. Marking this as a duplicate question. – R Greg Stacey Jun 28 '21 at 19:50
  • @whuber Thanks for the caveats. What do you mean by "summarizes a spectrum"? – R Greg Stacey Jun 28 '21 at 19:51
  • 1
    Many "instruments" make multiple measurements, such as counts of radiation at various frequency ranges (which is the classic spectrum). – whuber Jun 28 '21 at 19:52
  • @whuber Thanks so much. Each data point is a fluorescence intensity from an individual cell, so I believe they're independent. I'll add this to the question in case it's useful. – R Greg Stacey Jun 28 '21 at 20:05
  • If these are cells of a sensor, ordinarily they are strongly positively correlated. For instance, if a pure point source is detected, how many cells will pick it up? If more than one, then you have positive correlation. – whuber Jun 28 '21 at 20:52
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/126969/discussion-between-r-greg-stacey-and-whuber). – R Greg Stacey Jun 28 '21 at 20:53

0 Answers0