On using MapReduce to scale algorithms for Big Data analytics: a case study

Phongphun Kijsanayothin, Gantaphon Chalumporn, Rattikorn Hewett

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Introduction: Many data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution. Case description: This paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model. Discussion and evaluation: Formal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000. Conclusions: The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.

Original languageEnglish
Article number105
JournalJournal of Big Data
Volume6
Issue number1
DOIs
StatePublished - Dec 1 2019

Keywords

  • Association rules mining
  • Big Data analytics algorithms
  • MapReduce
  • Parallel computing

Fingerprint Dive into the research topics of 'On using MapReduce to scale algorithms for Big Data analytics: a case study'. Together they form a unique fingerprint.

Cite this