TY - GEN
T1 - A sub-linear scalable MapReduce-based apriori algorithm
AU - Chalumporn, Gantaphon
AU - Kijsanayothin, Phongphun
AU - Hewett, Rattikorn
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/11/20
Y1 - 2019/11/20
N2 - Association Rule Mining is one of the most popular data analytic algorithms where the well-known Apriori algorithm is its core. Like most machine learning and data mining algorithms, Apriori algorithm is designed for in-memory data. One natural way to cope with this limitation and emerging Bigdata challenges is to adapt the algorithm to parallel and distributed computing infrastructures, particularly a widely used Map-Reduce model. Much research has developed a variety of MapReduce-based Apriori algorithms. However, most have focused on either improving performance over that of the original Apriori algorithm or mechanisms MapReduce infrastructure. This paper presents yet another MapReduce-based Apriori algorithm. Unlike most traditional MapReduce-based Apriori that mimics incremental level-wise computation of the original Apriori, our MapReduce-based algorithm opportunistically exploits the map's keys for non-level-wise parallelism to fully benefit of parallel processing to gain efficiency. The paper describes our proposed approach and shows an empirical evaluation of its performance compared with that of the representative traditional MapReduce-based algorithm. The results show significant improvement with an average reduction of the execution time of about 70%, over 50,000-200,000 transactions on one to three machines, with 10% of support threshold. In fact, the execution time of the proposed MapReduce-based Apriori algorithm is empirically shown to scale sub-linearly (better than linear) in the number of transactions.
AB - Association Rule Mining is one of the most popular data analytic algorithms where the well-known Apriori algorithm is its core. Like most machine learning and data mining algorithms, Apriori algorithm is designed for in-memory data. One natural way to cope with this limitation and emerging Bigdata challenges is to adapt the algorithm to parallel and distributed computing infrastructures, particularly a widely used Map-Reduce model. Much research has developed a variety of MapReduce-based Apriori algorithms. However, most have focused on either improving performance over that of the original Apriori algorithm or mechanisms MapReduce infrastructure. This paper presents yet another MapReduce-based Apriori algorithm. Unlike most traditional MapReduce-based Apriori that mimics incremental level-wise computation of the original Apriori, our MapReduce-based algorithm opportunistically exploits the map's keys for non-level-wise parallelism to fully benefit of parallel processing to gain efficiency. The paper describes our proposed approach and shows an empirical evaluation of its performance compared with that of the representative traditional MapReduce-based algorithm. The results show significant improvement with an average reduction of the execution time of about 70%, over 50,000-200,000 transactions on one to three machines, with 10% of support threshold. In fact, the execution time of the proposed MapReduce-based Apriori algorithm is empirically shown to scale sub-linearly (better than linear) in the number of transactions.
KW - Association rules mining
KW - Big data analytics algorithms
KW - MapReduce-based Apriori
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=85079169545&partnerID=8YFLogxK
U2 - 10.1145/3372454.3372463
DO - 10.1145/3372454.3372463
M3 - Conference contribution
AN - SCOPUS:85079169545
T3 - ACM International Conference Proceeding Series
SP - 6
EP - 11
BT - ICBDR 2019 - Proceedings of the 2019 3rd International Conference on Big Data Research
PB - Association for Computing Machinery
T2 - 3rd International Conference on Big Data Research, ICBDR 2019
Y2 - 20 November 2019 through 21 November 2019
ER -