7

I am trying to mine association rules from my transaction dataset and I have questions regarding the support, confidence and lift of a rule.

Assume we have rule like {X} -> {Y}

I know that support is P(XY), confidence is P(XY)/P(X) and lift is P(XY)/P(X)P(Y), where the lift is a measurement of independence of X and Y (1 represents independent)

However, I just don't know how to interpret rules with these indicators. I have rules with high support, high confidence and low lift, is that a good rule ?

Since high confidence represents strong association and high support represents how convincing their association are. So high confidence + high support = good rule and we can ignore lift?

If I am going to order / rank my rules and pick, let say the best 10 to examine, which indicator should be chosen as the ranking variable?

BigData
  • 97
  • 1
  • 1
  • 7

1 Answers1

11

It depends on your task. But usually you want all three to be high.

  • high support: should apply to a large amount of cases
  • high confidence: should be correct often
  • high lift: indicates it is not just a coincidence

Consider e.g. "rain" and "day". Assuming we live in a very unfortunate place at the Equator, where it is raining 50% of the time, and it is day 50% of the time, and these are independent of each other. I.e. in 25% of the time it is raining and it is day.

We then have a support of 25% - that is pretty high for most data sets. We also have a confidence of 50% - that is also pretty good. If 50% of my visitors buy a product I recommend I would be a billionaire. But the lift is just 1, i.e. no improvement.

Beware that on other data sets, you won't get anywhere near 25% support. Consider a supermarket with diverse prodcuts. How many % of customers do you think buy toilet paper?

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • Since confidence considers the conditional probability of a customer buys X, how likely he will also buy Y. Can I say lift considers the "unconditional" probability of Y , or in other words, many people bought X and Y together, therefore the confidence will be high. However if also many customers bought Y (but not together with X), the lift will be low. – BigData Aug 13 '16 at 20:58
  • And so high support and high confidence could also be a bad rule – BigData Aug 13 '16 at 21:00
  • 1
    Yes, that is why people use lift or one of 20+ other metrics. Lift normalizes the confidence with the independence assumption. A lift of 1.0 means as likely as without the precondition. A lift of <1 indicates a negative correlation (assume that in above example, the confidence were just 40% - it would be high, but the likelihood of raining had even decreased compared to the unconditional 50%), so you usually want a lift > 1.5 - but there are many other measures. – Has QUIT--Anony-Mousse Aug 13 '16 at 21:03