A sub-linear scalable MapReduce-based apriori algorithm

Gantaphon Chalumporn, Phongphun Kijsanayothin, Rattikorn Hewett

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Association Rule Mining is one of the most popular data analytic algorithms where the well-known Apriori algorithm is its core. Like most machine learning and data mining algorithms, Apriori algorithm is designed for in-memory data. One natural way to cope with this limitation and emerging Bigdata challenges is to adapt the algorithm to parallel and distributed computing infrastructures, particularly a widely used Map-Reduce model. Much research has developed a variety of MapReduce-based Apriori algorithms. However, most have focused on either improving performance over that of the original Apriori algorithm or mechanisms MapReduce infrastructure. This paper presents yet another MapReduce-based Apriori algorithm. Unlike most traditional MapReduce-based Apriori that mimics incremental level-wise computation of the original Apriori, our MapReduce-based algorithm opportunistically exploits the map's keys for non-level-wise parallelism to fully benefit of parallel processing to gain efficiency. The paper describes our proposed approach and shows an empirical evaluation of its performance compared with that of the representative traditional MapReduce-based algorithm. The results show significant improvement with an average reduction of the execution time of about 70%, over 50,000-200,000 transactions on one to three machines, with 10% of support threshold. In fact, the execution time of the proposed MapReduce-based Apriori algorithm is empirically shown to scale sub-linearly (better than linear) in the number of transactions.

Original languageEnglish
Title of host publicationICBDR 2019 - Proceedings of the 2019 3rd International Conference on Big Data Research
PublisherAssociation for Computing Machinery
Pages6-11
Number of pages6
ISBN (Electronic)9781450372015
DOIs
StatePublished - Nov 20 2019
Event3rd International Conference on Big Data Research, ICBDR 2019 - Cergy-Pontoise, France
Duration: Nov 20 2019Nov 21 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference3rd International Conference on Big Data Research, ICBDR 2019
CountryFrance
CityCergy-Pontoise
Period11/20/1911/21/19

Keywords

  • Association rules mining
  • Big data analytics algorithms
  • MapReduce-based Apriori
  • Parallel computing

Fingerprint Dive into the research topics of 'A sub-linear scalable MapReduce-based apriori algorithm'. Together they form a unique fingerprint.

Cite this