3

I'm checking the conversion rates on some segments of a web application as to figure out which types of customers we should invest in paid acquisition (theory is that these segments see more value in the product and, as such, are willing to pay).

For example, on one segment there are 336 users of which 88 converted to paid. This gives a ~26% conversion rate for the segment but how can I have a high degree of confidence that the data is significant enough as to avoid burning precious resources chasing this segment?

TIA

Celso
  • 33
  • 1
  • 3

1 Answers1

3

From a statistical point of view, each user can be thought of as a Bernoulli trial for simplicity. (There are other ways of thinking about this, but I suspect this will be good enough for your purposes, and won't require going too far into more advanced statistics.) The success rate achieved over multiple Bernoulli trials is distributed as a binomial with probability $p$ (in your case 26%), and number of trials $N$ (for you 336). How much the observed proportion will bounce around can be assessed by the standard error of the proportion, which is:
$$ SE_{prop}=\sqrt{\frac{p(1-p)}{N}} $$ The Central Limit Theorem should assure us that the distribution of sample proportions will be approximately normal, given that your $N$ is so large. Thus, you could calculate a decent first approximation of the 95% confidence interval by simply multiplying your SE by 1.96 and adding (and subtracting) the product to your observed proportion.

From there, you need to think about how much this will cost the company and how much revenue will be generated. HTH, cheers.

Update: What will likely happen as the number of users goes from 336 to 336,000?

If 26% really is the true underlying probability, then the sample proportion will bounce around less and less far from that percentage in larger samples. Note however, that given your current data, a range of values are likely, so it may converge on a higher or lower number than 26%. That is what the 95% confidence interval tells you--the range of plausible values.

Note also that this is just a model: for example, you are assuming that every user has the same probability of converting into a paid customer, which is certainly not true, but probably a good enough approximation anyway. Imagine a more sophisticated model with a mix of groups of users, with the groups constituting different proportions of the total, and with each group having a different probability of converting. It's easy to see how this can amount to the same thing as your current model when you are only working with the aggregate. Models necessarily are simplifications, and thus are never veridical, as Box famously put it, "All models are wrong, but some are useful".

When you use the model to plan future actions, you are further assuming that nothing that will change over that time (e.g., the policy changes you implement, or other changes in the larger world) will not influence the underlying probability. This is also false. As they say, making predictions is hard, especially about the future. All you can do is use the best information available to you in the optimal way; the fact that there's always more information that you don't have and that your information is imperfect doesn't change that. These considerations illustrate how the model helps you think about what you may want to do, but cannot do your thinking for you.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thanks a lot gung! I can't upvote your answer unfortunately (not enough karma) but I really appreciated it. – Celso Jun 12 '12 at 11:00
  • No problem, however, if it does answer your question / provide what you need, you can accept it by clicking the check mark to the left under the voting mechanism. – gung - Reinstate Monica Jun 12 '12 at 14:15
  • Thanks for the tip. Although it answered the question, I'm still not comfortable with the result on my end. Perhaps I'll need to rephrase the question: is there a way to extrapolate that the outcome (conversion rate incl. error margin) will hold against a larger population (eg. instead of 336 users, lets say 336000)? – Celso Jun 12 '12 at 15:23
  • You can also unaccept the answer if it's really not what you needed by unclicking the check mark. As for your augmented question, it's too much to answer in a comment, so I'll update my answer. – gung - Reinstate Monica Jun 12 '12 at 15:43