A sub-linear scalable MapReduce-based apriori algorithm

Gantaphon Chalumporn; Phongphun Kijsanayothin; Rattikorn Hewett

doi:10.1145/3372454.3372463

A sub-linear scalable MapReduce-based apriori algorithm

Gantaphon Chalumporn, Phongphun Kijsanayothin, Rattikorn Hewett

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

Association Rule Mining is one of the most popular data analytic algorithms where the well-known Apriori algorithm is its core. Like most machine learning and data mining algorithms, Apriori algorithm is designed for in-memory data. One natural way to cope with this limitation and emerging Bigdata challenges is to adapt the algorithm to parallel and distributed computing infrastructures, particularly a widely used Map-Reduce model. Much research has developed a variety of MapReduce-based Apriori algorithms. However, most have focused on either improving performance over that of the original Apriori algorithm or mechanisms MapReduce infrastructure. This paper presents yet another MapReduce-based Apriori algorithm. Unlike most traditional MapReduce-based Apriori that mimics incremental level-wise computation of the original Apriori, our MapReduce-based algorithm opportunistically exploits the map's keys for non-level-wise parallelism to fully benefit of parallel processing to gain efficiency. The paper describes our proposed approach and shows an empirical evaluation of its performance compared with that of the representative traditional MapReduce-based algorithm. The results show significant improvement with an average reduction of the execution time of about 70%, over 50,000-200,000 transactions on one to three machines, with 10% of support threshold. In fact, the execution time of the proposed MapReduce-based Apriori algorithm is empirically shown to scale sub-linearly (better than linear) in the number of transactions.

Original language	English
Title of host publication	ICBDR 2019 - Proceedings of the 2019 3rd International Conference on Big Data Research
Publisher	Association for Computing Machinery
Pages	6-11
Number of pages	6
ISBN (Electronic)	9781450372015
DOIs	https://doi.org/10.1145/3372454.3372463
State	Published - Nov 20 2019
Event	3rd International Conference on Big Data Research, ICBDR 2019 - Cergy-Pontoise, France Duration: Nov 20 2019 → Nov 21 2019

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	3rd International Conference on Big Data Research, ICBDR 2019
Country/Territory	France
City	Cergy-Pontoise
Period	11/20/19 → 11/21/19

Keywords

Association rules mining
Big data analytics algorithms
MapReduce-based Apriori
Parallel computing

Access to Document

10.1145/3372454.3372463

Cite this

@inproceedings{d6557d88443d4f5485c3dd12fcf51e8b,

title = "A sub-linear scalable MapReduce-based apriori algorithm",

abstract = "Association Rule Mining is one of the most popular data analytic algorithms where the well-known Apriori algorithm is its core. Like most machine learning and data mining algorithms, Apriori algorithm is designed for in-memory data. One natural way to cope with this limitation and emerging Bigdata challenges is to adapt the algorithm to parallel and distributed computing infrastructures, particularly a widely used Map-Reduce model. Much research has developed a variety of MapReduce-based Apriori algorithms. However, most have focused on either improving performance over that of the original Apriori algorithm or mechanisms MapReduce infrastructure. This paper presents yet another MapReduce-based Apriori algorithm. Unlike most traditional MapReduce-based Apriori that mimics incremental level-wise computation of the original Apriori, our MapReduce-based algorithm opportunistically exploits the map's keys for non-level-wise parallelism to fully benefit of parallel processing to gain efficiency. The paper describes our proposed approach and shows an empirical evaluation of its performance compared with that of the representative traditional MapReduce-based algorithm. The results show significant improvement with an average reduction of the execution time of about 70%, over 50,000-200,000 transactions on one to three machines, with 10% of support threshold. In fact, the execution time of the proposed MapReduce-based Apriori algorithm is empirically shown to scale sub-linearly (better than linear) in the number of transactions.",

keywords = "Association rules mining, Big data analytics algorithms, MapReduce-based Apriori, Parallel computing",

author = "Gantaphon Chalumporn and Phongphun Kijsanayothin and Rattikorn Hewett",

note = "Publisher Copyright: {\textcopyright} 2019 Association for Computing Machinery.; 3rd International Conference on Big Data Research, ICBDR 2019 ; Conference date: 20-11-2019 Through 21-11-2019",

year = "2019",

month = nov,

day = "20",

doi = "10.1145/3372454.3372463",

language = "English",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "6--11",

booktitle = "ICBDR 2019 - Proceedings of the 2019 3rd International Conference on Big Data Research",

}

Chalumporn, G, Kijsanayothin, P & Hewett, R 2019, A sub-linear scalable MapReduce-based apriori algorithm. in ICBDR 2019 - Proceedings of the 2019 3rd International Conference on Big Data Research. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 6-11, 3rd International Conference on Big Data Research, ICBDR 2019, Cergy-Pontoise, France, 11/20/19. https://doi.org/10.1145/3372454.3372463

A sub-linear scalable MapReduce-based apriori algorithm. / Chalumporn, Gantaphon; Kijsanayothin, Phongphun; Hewett, Rattikorn.
ICBDR 2019 - Proceedings of the 2019 3rd International Conference on Big Data Research. Association for Computing Machinery, 2019. p. 6-11 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A sub-linear scalable MapReduce-based apriori algorithm

AU - Chalumporn, Gantaphon

AU - Kijsanayothin, Phongphun

AU - Hewett, Rattikorn

PY - 2019/11/20

Y1 - 2019/11/20

N2 - Association Rule Mining is one of the most popular data analytic algorithms where the well-known Apriori algorithm is its core. Like most machine learning and data mining algorithms, Apriori algorithm is designed for in-memory data. One natural way to cope with this limitation and emerging Bigdata challenges is to adapt the algorithm to parallel and distributed computing infrastructures, particularly a widely used Map-Reduce model. Much research has developed a variety of MapReduce-based Apriori algorithms. However, most have focused on either improving performance over that of the original Apriori algorithm or mechanisms MapReduce infrastructure. This paper presents yet another MapReduce-based Apriori algorithm. Unlike most traditional MapReduce-based Apriori that mimics incremental level-wise computation of the original Apriori, our MapReduce-based algorithm opportunistically exploits the map's keys for non-level-wise parallelism to fully benefit of parallel processing to gain efficiency. The paper describes our proposed approach and shows an empirical evaluation of its performance compared with that of the representative traditional MapReduce-based algorithm. The results show significant improvement with an average reduction of the execution time of about 70%, over 50,000-200,000 transactions on one to three machines, with 10% of support threshold. In fact, the execution time of the proposed MapReduce-based Apriori algorithm is empirically shown to scale sub-linearly (better than linear) in the number of transactions.

AB - Association Rule Mining is one of the most popular data analytic algorithms where the well-known Apriori algorithm is its core. Like most machine learning and data mining algorithms, Apriori algorithm is designed for in-memory data. One natural way to cope with this limitation and emerging Bigdata challenges is to adapt the algorithm to parallel and distributed computing infrastructures, particularly a widely used Map-Reduce model. Much research has developed a variety of MapReduce-based Apriori algorithms. However, most have focused on either improving performance over that of the original Apriori algorithm or mechanisms MapReduce infrastructure. This paper presents yet another MapReduce-based Apriori algorithm. Unlike most traditional MapReduce-based Apriori that mimics incremental level-wise computation of the original Apriori, our MapReduce-based algorithm opportunistically exploits the map's keys for non-level-wise parallelism to fully benefit of parallel processing to gain efficiency. The paper describes our proposed approach and shows an empirical evaluation of its performance compared with that of the representative traditional MapReduce-based algorithm. The results show significant improvement with an average reduction of the execution time of about 70%, over 50,000-200,000 transactions on one to three machines, with 10% of support threshold. In fact, the execution time of the proposed MapReduce-based Apriori algorithm is empirically shown to scale sub-linearly (better than linear) in the number of transactions.

KW - Association rules mining

KW - Big data analytics algorithms

KW - MapReduce-based Apriori

KW - Parallel computing

UR - http://www.scopus.com/inward/record.url?scp=85079169545&partnerID=8YFLogxK

U2 - 10.1145/3372454.3372463

DO - 10.1145/3372454.3372463

M3 - Conference contribution

AN - SCOPUS:85079169545

T3 - ACM International Conference Proceeding Series

SP - 6

EP - 11

BT - ICBDR 2019 - Proceedings of the 2019 3rd International Conference on Big Data Research

PB - Association for Computing Machinery

T2 - 3rd International Conference on Big Data Research, ICBDR 2019

Y2 - 20 November 2019 through 21 November 2019

ER -

A sub-linear scalable MapReduce-based apriori algorithm

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this