2

I have three groups of experiments. For each experiment I am looking for the percentage of occurrence of case x.

In the first group I have 15 experiments. The case x was seen 10.191% of the total time for 15 experiments.

In the second group I have 6 experiments. The percentage of x is 1.564%.

In the third group I have 3 experiments. The percentage of x is 0%.

I want to show that occurrence of case x significantly decreased from group one to two and to three. Thus I want to calculate the p-values. Can anyone tell me how to do it?

Edit:

The number of measurements for the first group is 22568 and 10.191% of these measurements are case x.

The second group has 1854 measurements (1.564% are x) and the third group has 1164 measurements (0% x)

rolando2
  • 11,645
  • 1
  • 39
  • 60
kotoll
  • 29
  • 2
  • 8
    Could you please explain how it would be possible for a percentage of *anything* in a group of size $n=6$ to be any value other than $0, 100/6, 200/6, \ldots, 500/6,$ and $100$? In particular, how do you arrive at $1.5667\%$? – whuber Oct 21 '13 at 15:46
  • @whuber The data-sets i have are time dependent. In the second group, the case x was seen 1.5667% of the total time. – kotoll Oct 21 '13 at 15:49
  • 1
    That doesn't really answer the question, it just raises more questions. If you have N = 6, @whuber is correct. If you have something else, please describe what you have. See [how to ask a statistics question](http://www.statisticalanalysisconsulting.com/how-to-ask-a-statistics-question/). – Peter Flom Oct 21 '13 at 16:12
  • @PeterFlom Please see the edit. I hope now my question is more clear – kotoll Oct 21 '13 at 16:22
  • You may need to explicit a little bit better what you did because (at least to me) it is still unclear. Something on the line of: "In each experiment I measured 5000 events and counted how many were of type x", or whatever you did... – nico Oct 21 '13 at 17:03
  • 1
    I think you are using incorrect terminology: you have _rates_, not _percentages_. So in the first group, you have the event occurring at an (average) rate of 0.1 / minute, for example. Is that correct? – Aniko Oct 21 '13 at 17:04
  • @nico you are right i should have written before. Please see the edit – kotoll Oct 21 '13 at 17:40
  • After the correction from @nico , here is the solution: https://onlinecourses.science.psu.edu/stat414/node/268. Thank you so much to everyone – kotoll Oct 21 '13 at 18:46
  • 2
    You can add an official answer to your own questions, but in this case your answer might be wrong. It completely ignores the fact that there are 15 experiments in group 1, and that outcomes within an experiment might not be independent. – Aniko Oct 21 '13 at 21:16
  • 1
    The question asks about differences between groups of experiments, so on one count that would address @Aniko's concern. Then again, in doing such tests we attempt to compare obtained results to the sort that would occur by chance under a null hypothesis. Does it make sense to characterize any null hypothesis at the level of "group of experiments?" – rolando2 Oct 22 '13 at 20:01

2 Answers2

1

Since numbers are available (obtained from percentages), chi-square gives following result (in R code):

> M
      [,1] [,2]
[1,] 22568 2300
[2,]  1854   29
[3,]  1164    0
> 
> chisq.test(M)

        Pearson's Chi-squared test

data:  M
X-squared = 246.59, df = 2, p-value < 0.00000000000000022

Edit: I should probably take no_x and x counts rather than total_N and x counts:

> M
      [,1] [,2]
[1,] 20268 2300
[2,]  1825   29
[3,]  1164    0
> 
> chisq.test(M)

        Pearson's Chi-squared test

data:  M
X-squared = 276.24, df = 2, p-value < 0.00000000000000022
rnso
  • 8,893
  • 14
  • 50
  • 94
0

You can use a Wald approximation for confidence intervals for the first set of experiments, and probably the second. For the third, you can't calculate confidence intervals at all, but you can test whether any of the other experiments were significantly different from zero.

Wikipedia has a formula for the Wald and other approximations and this paper: Interval Estimation for a Binomial Proportion, describes the various approximations in more detail.

You should probably test for differences between the samples in each experiment group (which is what I assume Aniko is implying. If there are big differences between the 15 experiments in the first group for instance, it would call into question these simple confidence intervals, and you might want to consider an effects model of some sort.

david25272
  • 872
  • 5
  • 6
  • 3
    The Wald approximation can be wildly inaccurate where the proportions are close to zero or one, as they seem to be in this case, and is not a recommended method in the paper that you link. It should probably never be used. See this: http://stats.stackexchange.com/questions/15567/putting-a-confidence-interval-on-the-mean-of-a-very-rare-event/15568#15568 – Michael Lew May 17 '14 at 01:21