1

First, this may be a duplicate of:

statistical significance

I'm unsure if that post covers my exact situation. If so, just mark this as a duplicate. Let's say I have a list of 100k potential clients and I sell cars. I select 10k clients based on a probability to buy a car from a model with many features. Each client has a different probability. I test the 10k clients on whether they buy a car or not across different periods of time and bucket the clients into four buckets based on, let's say their income level.

For week 4, I have 1200 clients in the income bucket "$50k-$75k", and 1000 of those bought a car. I also have 1600 clients in income bucket "$76k-$100k" and 1100 of them bought a car. Can I use the fisher test to calculate the p-value between these two subgroups of clients in these specific buckets?

A couple things I am unsure of is the definition of sample in this experiment and if the size is too large for the fisher test. Also, I'm assuming this is a sample without replacement and does that work with the fisher test?

Are there better options than the fisher test?

d84_n1nj4
  • 61
  • 5

1 Answers1

0

As a back-of-the-envelope calculation, you can estimate the means for the two salary bands as 0.83 and 0.69. The standard error of each of these estimates is less than $\frac{1}{2\sqrt{n}} < \frac{1}{2\sqrt{1024}} < \frac{1}{64}$ < 0.0156. So the difference between your mean estimates for the two salary groups is more than 9 times the standard error of each estimate, so you can be highly confident that the true population means are different.

fblundun
  • 3,732
  • 1
  • 5
  • 18