Association rule lift ratio converging to 0.5, is it not independent?

Question

As far as I know, lift is a measure used in association rules to see whether there is a positive association between two items(instances?)

I've done a simple (possibly wrong) simulation with lift measure using 'arules' package in r.

My intention was to make two independent vectors and see if their lift values were close to 1

a <- sample(1:100,100000,replace = T)
b <- sample(1:100,100000,replace = T)

data <- data.frame(a,b)
data$id <- 1:nrow(data)

trans <- melt(data, id = "id") %>% select(-variable)
trans <- as(split(trans[,"value"], trans[,"id"]), "transactions")

lift <- crossTable(trans,measure='lift')
hist(lift)

I thought the result would be distributed around 1, but it was actually distributed around 0.5 and I've done this a lot of times with different values, still getting similar results.

What am I doing wrong here?

Why is the lift value distributed around 0.5?

Please help me out!

score 0 · Accepted Answer · answered Nov 26 '18 at 19:33

This is a very good demonstration of one important thing! People often think that association rules would express positive statistical dependencies, since in statistics, word 'association' refers to dependence between discrete variables. However, traditional association rules per-se (without additional filters) do not select positive associations, you will find also negative associations and independence relations. They merely present frequently co-occurring attributes.

If you would like to find real positive associations, there are two ways. 1) Use the tool you have but add filters and select only rules with lift>1. You may also want to test statistical significance, since some associations deviate from independence only by chance. Note that you need to use minimal possible minimum frequency threshold to find all/the most significant associations (and your program may not be feasible). 2) Use programs that search directly positive dependencies and may also do statistical testing for you. They are more efficient and can work even without any minimum frequency constraints. See e.g. question Interpreting association rules correctly?

Association rule lift ratio converging to 0.5, is it not independent?

1 Answers1