0

I have 11 different dates and for each date i have two random sample where group A got specific treatment while the control didn't. i want to measure the difference in proportion between those samples for the entire period of time which means - some kind of an average change between group A and control.

Example of my data -

Month   p_groupa p_control
2019-1  4.32%   54.07%
2019-2  4.40%   66.28%
2019-3  9.56%   58.04%
2019-4  5.30%   53.97%
2019-5  8.92%   51.66%
2019-6  7.72%   49.26%
2019-7  8.18%   50.91%
2019-8  15.85%  53.79%
2019-9  22.39%  54.23%
2019-10 8.47%   59.27%
2019-11 7.18%   51.28%

In addition, how to treat to the numbers for Aug and Sep ?

Thanks

kncdwn
  • 23
  • 6
  • Are you interested in a single summary measure or a time-varying trend, with possible bump here and there (e.g., lower difference in proportions in August and September)? – chl Oct 08 '20 at 12:58
  • Single summary measure – kncdwn Oct 08 '20 at 13:00
  • You can compute the time-averaged difference in % but this won't account for the aug-sep period. – chl Oct 08 '20 at 13:05
  • regardless the sample size of each month ? or to calculate weighted average of that difference ? – kncdwn Oct 08 '20 at 13:25
  • Assuming you have individual (or aggregated) data, you probably need a generalized linear mixed model (conditional approach) or alternative like generalized estimating equations (marginal approach). – chl Oct 08 '20 at 14:30
  • It seems that this problem might better be addressed by a logistic regression, with some handling of repeated measures if the same individuals in each group are evaluated on each date. – EdM Oct 08 '20 at 15:36

1 Answers1

0

If you are interested in the mean difference between the proportions, you can use a paired test: t-test, if you are willing to make some assumptions, or a Wilcox(on) test for the non-parametric case. For more information How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples.

t-test

    Paired t-test

data:  df$p_groupa and df$p_control
t = -19.319, df = 10, p-value = 3.01e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -50.74456 -40.24999
sample estimates:
mean of the differences 
              -45.49727

wilcox test

    Wilcoxon signed rank test

data:  df$p_groupa and df$p_control
V = 0, p-value = 0.0009766
alternative hypothesis: true location shift is not equal to 0
user2974951
  • 5,700
  • 2
  • 14
  • 27
  • This does not account for the time component (i.e., how do the differences in proportions evolve over time), and I find the proposed tests which are usually applied to continuous outcomes quite inappropriate. Any reason to suggest a Wilcoxon test, for example? – chl Oct 08 '20 at 12:55
  • Why would you consider the samples as paired ? – kncdwn Oct 08 '20 at 13:06
  • @chl OP mentioned that he is interested in the "entire period of time", so I understood this as the time component is not relevant. Regarding the second point, I agree that these are not real continuous variables, but I think it is a good approximation. – user2974951 Oct 08 '20 at 13:06
  • @chl That's how I understood OP, that is I understood the differences should be done date by date because they are part of the same block, treatment A and B. – user2974951 Oct 08 '20 at 13:07
  • (1) At each time point, the two samples are independent (this is a two-arm design, with repeated measures). (2) See [this answer](https://stats.stackexchange.com/a/169740/930) regarding the limitations of the Wilcoxon test when dealing with proportions. – chl Oct 08 '20 at 13:10
  • Two independent sample and each had different treatment – kncdwn Oct 08 '20 at 13:13