Sometimes it helps to picture the goal of such an analysis and what a company can do without one. Suppose the company the turnover data belongs to wants to do something against a (possibly) high turnover rate. I can imagine two possible actions
- Find out what is driving people to leave and fix this (not enough healthcare ? No teamspirit ?) in general
- Find the employees which are considering to leave and talk to them, finding out what drives them to fix the issues specifically for them.
So why does this matter ?
Lift charts are primarily important for the second usecase. Imagine what a company can do when they have decided to invest money talking to employees 1 to 1 but do not have a model ? The only option is to talk to everyone or to everyone in a random sample of a fixed size. Talking to everyone, despite the gain of identifying all potential departers is way too expensive. But when only a random sample is selected to talk to, only a fraction of all potential departers is identified meanwhile still spending a lot of money. In both cases, the cost-per-leave-prevention-ratio is quite high.
But when a good model exists, the company can decide to talk only to those which have the highest probability to leave (those with the topscores according to the model), so that more of the potential departers are identified, hence optimizing the cost-per-leave-prevention.
Take a look again at the first two tables here: http://www2.cs.uregina.ca/~dbd/cs831/notes/lift_chart/lift_chart.html. Let's say that "customers"="employees" and "positive respondents" = "potential departers" (see data below).
If the company decides it can only spend enough money to talk to 10000 employees, it will identify
- $\frac{20000}{100000}*10000=2000$ departers are identified without a model
- $\frac{6000}{10000}*10000=6000$ departers are identified with the model (selecting only the top 10000 according to the model score)
which means
- an improvement of factor $\frac{6000}{2000}=3$ which is pictured as point (10%,3) in the lift chart.
- that 6000 of 20000 total departers have been identified, i.e. 30%, which is pictured as (10%,30%) in the gain chart. The baseline here is only 10%, because by taking a random sample of 10000 employees, only $\frac{10000 * (20000/100000)}{20000}=\frac{10000}{100000}=0.1$ of all potential departers are identified.
The x-axis in both cases shows the percentage of employees contacted, in this specific example 10%.
Appendix
Data used to make this question independent of link rot.
Overall Rate
Total Employees Contacted Identified Departers
100000 20000
Effectiveness of the model when employees are contacted in chunks of 10000
Total Employees Contacted Identified Departers
10000 6000
20000 10000
30000 13000
40000 15800
50000 17000
60000 18000
70000 18800
80000 19400
90000 19800
100000 20000