5

I am working on an application where I have to extract or identify association / correlation between different sets of items. An example would be say if a person buys shoes at a store, would he/she buy socks also? So, I will have to find the association between shoes and socks based on legacy data.

Now, I know that apriori is one famous algorithm for association rule mining. What I want to know that is there any other algorithm which is much more efficient than apriori for association rule mining? By efficiency I mean in terms of implementation easiness, correctness and speed.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Jason
  • 59
  • 2

3 Answers3

3

Obviously, every algorithm is the best one. Just read the papers...

Seriously, there is no one-size-fits-all.

Depending on your data set characteristics and parameters one algorithm may be best, or another. Sometimes APRIORI works really well, because it is quite simple, and thus your implementation may be very efficient. FP-growth is a rather complex algorithm, but it's also clever. If you are a very good programmer, FP-growth may be the way to go.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • can you provide me some implementation example/package in python which finds association rule using fp-growth? – Jason Nov 24 '14 at 04:55
2

FP-growth is much more efficient than a-priori since it doesn't require candidate generation phase. Original paper on FP-growth: http://dl.acm.org/citation.cfm?id=335372

0

You mention you would like to find associations/correlations. Do you mean statistical dependencies? If yes, then be careful with basic apriori. It finds only frequent co-occurrences, not necessarily what you want. But there are algorithms for association rule mining that also test statistical dependence. See question Interpreting association rules correctly?

whamalai
  • 126
  • 5