0

In web analytics we often calculate conversion rates for groups of users (number of users who bought / number of users).

If we turn this around and a new user lands on a site I believe that this user's probability of converting (given that the user is in a certain group) is not the same as the conversion rate for that group.

Looking at conversion rate for the segment of users in the 30-35 age group we have:

  • P( conversion | in the 30-35 age group) - What we want to know: Given a new user in the 30 - 35 age group what is his probability to convert?
  • P(in the 30-35 age group | conversion) - this is from our records (number of conversion by users in the 30-35 age group / total number of conversion)
  • P(converting) - overall conversion rate (users who converted / all users)
  • P(in the 30-35 age group) - proportion of all users who are n the 30-35 age group

Intuitively we might think that a new user in the 30-35 age group's probability to convert [P(conversion | in the 30-35 age group)] is the same as the conversion rate [P(in the 30-35 age group | conversion)].

But from Bayes theorem:

P(conversion | in the 30-35 age group) = P(in the 30-35 age group | conversion) * P(conversion) / P(in the 30-35 age group)

Is this the correct way to calculate the probability to convert for a new user in the 30 - 35 age group?

Gala
  • 8,323
  • 2
  • 28
  • 42
DavidA
  • 103
  • 2

1 Answers1

1

Intuitively we might think that a new user in the 30-35 age group's probability to convert [P(conversion | in the 30-35 age group)] is the same as the conversion rate [P(in the 30-35 age group | conversion)]

err... your intuition is wrong? $P(A \cap B) \neq P(A|B) \neq P(B|A)$. Let's work on your intuition here before moving on. An equivalent analogy would be like saying:

\begin{equation} \begin{split} X &= P(\text{anarchist}| \text{in the 18-25 age group})\\ Y &= P(\text{in the 18-25 age group}|\text{anarchist})\\ X &= Y \end{split} \end{equation}

$X$ is essentially the proportion of people of a certain age who are anarchists. $Y$ is essentially the proportion of anarchists who are in a certain age group. Focusing our attention to the US, we can safely say that most people of voting age are democrat or republican, so $X$ will be small. But, most people who are anarchists are idealistic young people (to generalize broadly), so $Y$ will actually be large. Clearly, $X \neq Y$. I believe your confusion may lie in equating the notation used for $X$ and $Y$ with the following quantity:

$$Z = P(\text{anarchist} \bigcap \text{in the 18-25 age group})$$

Here, $Z$ is the proportion of people who are both anarchists and in our age group over the entire population. When we take a conditional probability, we are essentially constraining our sample space. $Z$ is not conditioned on anything, it's just an intersection, so we have to consider the entire sample space. Although $Z$ is different from $X$ and $Y$, it is closely related to those quantities by the law of conditional probability:

$$P(A | B) = \frac{P(A \cap B)}{P(B)}$$

Therfore,

\begin{equation} \begin{split} X&=\frac{Z}{P(\text{in the 18-25 age group})} \\ Y&=\frac{Z}{P(\text{anarchist})} \\ \end{split} \end{equation}

So as small as $X$ may be, we know that $Z$ is necessarily smaller (unless one of your conditioning probabilities equals 1).

I hope this clarifies your confusion a bit.

Is this the correct way to calculate the probability to convert for a new user in the 30 - 35 age group?

Well, your application of Bayes theorem looks fine, but you could make your life a bit easier since you own the data. Your target is just a conditional probability, so let's focus on that:

$P(\text{conversion} | \text{in the 30-35 age group}) = $

$=\frac{P( \text{conversion} \bigcap \text{in the 30-35 age group})}{P(\text{in the 30-35 age group})}$

$= \frac{(\text{number of people who converted AND age 30-35})/(\text{total sample size})}{(\text{number of people age 30-35})/(\text{total sample size})}$

$= \frac{\text{number of people who converted AND age 30-35}} {\text{number of people age 30-35}}$

You can use Bayes theorem as you suggested, but I suspect this latter formulation will be simpler to calculate. You already used an analogous calculation to figure your conversion rate.

David Marx
  • 6,647
  • 1
  • 25
  • 43