What do Lift and Gain Charts state in the context of an employee turnover model

Question

So I am trying to further understand Lift and Gain charts as it applies to my employee turnover model (i.e. used CHAID in SPSS Modeler). For my data this means predicting number of people voluntarily leaving the company.

I have reviewed the below references and have the basics down regarding interpretation: what is plotted on the x and y axis and the ideal curve you are looking for. I even practiced constructing my own gains and lifts charts in Excel.

But all the examples I have seen thus far are for a direct mail campaign. Now I want to know what this means for my data. Does it merely mean, in the case of the gains chart that if I sample top 10% of my data I can expect 40% of terms vs sampling top 60% of my data get's 80% of terms? (please assume the 40% and 60% are the values). If so then what significance should I take away from that because I really don't get it in the context of my turnover model?

References:

lift-measure-in-data-mining

what-is-a-lift-chart

http://www2.cs.uregina.ca/~dbd/cs831/notes/lift_chart/lift_chart.html

Why are you using CHAID? To the best of my understanding it is an old tree classification method that pre dates CART and lacks many of the good statistical properties of CART. — Michael R. Chernick, Jul 27 '12 at 03:21
@Michael: I agree, it is an older method. But I am in a position where I am picking up the pieces of a what a previous analyst was using since he left the company. For now I am just picking up from where he left off. Eventually I want to branch off to using other methods and even ensembles. @ steffen - thanks. — daniellopez46, Jul 30 '12 at 14:30

mlwida · Accepted Answer · 2012-07-27T14:10:00.477

8

Sometimes it helps to picture the goal of such an analysis and what a company can do without one. Suppose the company the turnover data belongs to wants to do something against a (possibly) high turnover rate. I can imagine two possible actions

Find out what is driving people to leave and fix this (not enough healthcare ? No teamspirit ?) in general
Find the employees which are considering to leave and talk to them, finding out what drives them to fix the issues specifically for them.

So why does this matter ?

Lift charts are primarily important for the second usecase. Imagine what a company can do when they have decided to invest money talking to employees 1 to 1 but do not have a model ? The only option is to talk to everyone or to everyone in a random sample of a fixed size. Talking to everyone, despite the gain of identifying all potential departers is way too expensive. But when only a random sample is selected to talk to, only a fraction of all potential departers is identified meanwhile still spending a lot of money. In both cases, the cost-per-leave-prevention-ratio is quite high.

But when a good model exists, the company can decide to talk only to those which have the highest probability to leave (those with the topscores according to the model), so that more of the potential departers are identified, hence optimizing the cost-per-leave-prevention.

Take a look again at the first two tables here: http://www2.cs.uregina.ca/~dbd/cs831/notes/lift_chart/lift_chart.html. Let's say that "customers"="employees" and "positive respondents" = "potential departers" (see data below).

If the company decides it can only spend enough money to talk to 10000 employees, it will identify

$\frac{20000}{100000}*10000=2000$ departers are identified without a model
$\frac{6000}{10000}*10000=6000$ departers are identified with the model (selecting only the top 10000 according to the model score)

which means

an improvement of factor $\frac{6000}{2000}=3$ which is pictured as point (10%,3) in the lift chart.
that 6000 of 20000 total departers have been identified, i.e. 30%, which is pictured as (10%,30%) in the gain chart. The baseline here is only 10%, because by taking a random sample of 10000 employees, only $\frac{10000 * (20000/100000)}{20000}=\frac{10000}{100000}=0.1$ of all potential departers are identified.

The x-axis in both cases shows the percentage of employees contacted, in this specific example 10%.

Appendix

Data used to make this question independent of link rot.

Overall Rate

Total Employees Contacted    Identified Departers
100000                       20000

Effectiveness of the model when employees are contacted in chunks of 10000

Total Employees Contacted    Identified Departers
10000                        6000
20000                        10000
30000                        13000
40000                        15800
50000                        17000
60000                        18000
70000                        18800
80000                        19400
90000                        19800
100000                       20000

edited Jul 27 '12 at 14:10

answered Jul 27 '12 at 08:16

mlwida

9,922
2
45
74

english is not my native tongue and I do not like to use "leavers". What is the correct term here ? – mlwida Jul 27 '12 at 08:17
Steffen, "leavers" is understandable but "departers" might be more conventional. The use of "caught," however, is a little jarring, because this word has connotations of someone apprehended for malfeasance: a criminal is "caught" but the subject of a study is "identified." – whuber Jul 27 '12 at 13:17
1

Steffen, the general term used when describing churn modeling in customer relationship management / marketing analytics is "Attriter". This relates to the notion of attrition. I would suspect that term appropriate in the human resources analytics world, but cant say for certain. – B_Miner Jul 27 '12 at 14:17
@B_Miner Thanks: that's interesting. You won't find "attriter" in [most dictionaries](http://dictionary.cambridge.org/spellcheck/american-english/?q=attriter), but it does [appear on the Internet](http://en.wiktionary.org/wiki/attriter). This must be a highly specialized word used in the industry, as you suspect. It has a neutral connotation because "attrition," from which this word is derived, refers generally to all causes of loss, whereas "leaver" or "departer," for instance, come with connotations of someone taking overt action to go away. – whuber Jul 27 '12 at 14:30
1

@Whuber, yes it is a specialized term. One also refers to "hard" and "soft" attrition. The former typically being a proactive choice by the customer and the latter being less a choice that the company could influence / intervene to correct. For example, some customers are soft attriters because they move, or die or are removed by the company due to non-payment. In some churn modeling using (typically discrete time) survival analysis, competing risks is used to differentiate these causes of attrition. – B_Miner Jul 27 '12 at 14:46
1

Sometimes soft attrition is also used to describe a relationship that remains technically "active" but the customer ceases to be engaged (e.g. still has a credit card account but hasn't charged anything in a given period). – B_Miner Jul 27 '12 at 14:49
Thanks for the thoughts on terminology. "leavers" is just what was used by the previous analyst and what had become accepted by management here at my company. However I like the term "departers" more or even better, the more technical term: "Attriter" – daniellopez46 Jul 30 '12 at 14:51
@steffen. Thanks for the thorough answer. It is much appreciated. I plan to accept it. What we currently do is take the model and apply it to current population to come up with propensity scores for each individual. We then sum them up the propensity scores by organization to come up with total potential departers. So even in this case were we are not explicitly taking action #2, as you stated, we can still find utility in looking at the gains and lift charts because it does also translate to giving us confidence in our total estimate. – daniellopez46 Jul 30 '12 at 16:53
In other words if the performance is good at each or most deciles compared to baseline then we can expect the total estimate of potential departers to have a certain level of accuracy. Do I more or less have this correct? – daniellopez46 Jul 30 '12 at 16:54
@daniellopez46 yes this is correct. Take for example the gain chart: The closer the model curve to the optimal point (20,100), the better the [accuracy](http://en.wikipedia.org/wiki/Accuracy_and_precision) of the model. – mlwida Jul 30 '12 at 19:10

What do Lift and Gain Charts state in the context of an employee turnover model

1 Answers1