1

I feel I can solve textbook Statistics problems, but powerless when approaching real life ones. Here is one example:

I have two groups of users of equal size (say 50K), A and B. Both may engage an Event X for say 35 days, while A has access to an additional feature and B does not.

Each user of the two groups can engage the Event multiple times every day (no limit). What I want to compare is if A has more engages than B, in total.

I have the following data for total number of engages by day:

day |   A   |   B
1   | 1000  |  1002
2   |  998  |   994
... | ...   |  ...
35  |  5600 |  5590

My question is: Which test should I use? Can I use two sample T-test? I think that I can use T-test as I read:

T-test: This test determines whether the average per-customer metric results observed are statistically different between the test and control groups (e.g., did test group customers exhibit higher average spend amounts as compared with the customers in the control group?).

from here: https://www.optimove.com/resources/learning-center/statistical-significance-in-marketing

Edit

After reading SurveyMonkey's page about T-test: https://www.surveymonkey.com/mp/t-tests-explained/

I think I'm better at the topics. So I should follow the following steps:

  1. Get per user data in the following format, for the whole period (35 days):
user from A | # of engagements
A000001 | 12
A000002 | 0
........| ..

And do the same for B.

  1. Get mean and variance from these two tables

Say mean(group A) is 0.50 and mean(group B) is 0.48

  1. Use the formula with sample size 50K for both:

Formula taken from SurveyMonkey (ignore the calculations)

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    Your work looks good except for the fact that you are using the pooled t-test statistic. Is there something you haven't mention that would make this a pooled/dependent sample rather than an unpooled/independent? From what I'm reading the members of group A have no relation to the members of group B. – Todd Burus Jan 30 '20 at 05:19
  • Hi @Todd no they are independent. Sorry I'm still beginning to grasp the idea. How can I modify it to independent t test? – Nicholas Humphrey Jan 30 '20 at 05:23
  • 1
    The test statistic should be $t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}$ where $\bar{x}_i$, $s_i$ and $n_i$ are the sample mean, standard deviation and size for the two samples. You also need to check that your data meets the appropriate assumptions for a two sample unpooled t-test. – Todd Burus Jan 30 '20 at 07:25
  • 1
    You have count data, so I would first think about a Poisson distribution (but there might be overdispersion ..., and with n=50k a normal approximation forn the mean should be OK). But see https://stats.stackexchange.com/questions/9561/checking-if-two-poisson-samples-have-the-same-mean – kjetil b halvorsen Jan 30 '20 at 15:55

0 Answers0