I feel I can solve textbook Statistics problems, but powerless when approaching real life ones. Here is one example:
I have two groups of users of equal size (say 50K), A and B. Both may engage an Event X for say 35 days, while A has access to an additional feature and B does not.
Each user of the two groups can engage the Event multiple times every day (no limit). What I want to compare is if A has more engages than B, in total.
I have the following data for total number of engages by day:
day | A | B
1 | 1000 | 1002
2 | 998 | 994
... | ... | ...
35 | 5600 | 5590
My question is: Which test should I use? Can I use two sample T-test? I think that I can use T-test as I read:
T-test: This test determines whether the average per-customer metric results observed are statistically different between the test and control groups (e.g., did test group customers exhibit higher average spend amounts as compared with the customers in the control group?).
from here: https://www.optimove.com/resources/learning-center/statistical-significance-in-marketing
Edit
After reading SurveyMonkey's page about T-test: https://www.surveymonkey.com/mp/t-tests-explained/
I think I'm better at the topics. So I should follow the following steps:
- Get per user data in the following format, for the whole period (35 days):
user from A | # of engagements
A000001 | 12
A000002 | 0
........| ..
And do the same for B.
- Get mean and variance from these two tables
Say mean(group A) is 0.50 and mean(group B) is 0.48
- Use the formula with sample size 50K for both: