Collective Computing for Scientific Big Data Analysis

Jialin Liu, Yong Chen, Surendra Byna

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Big science discovery requires an efficient computing framework in the high performance computing architecture. Traditional scientific data analysis relies on Message Passing Interface (MPI) and MPI-IO to achieve fast computing and low I/O bottleneck. Among them, two-phase collective I/O is commonly used to reduce data movement by optimizing the non-contiguous I/O pattern. However, the inherent constraint of collective I/O prevents it from having a flexible combination with computing and lacks an efficient non-blocking I/O-Computing framework in current HPC. In this work, we propose Collective Computing, a framework that breaks the constraint of the two-phase collective I/O and provides an efficient non-blocking computing paradigm with runtime support. The fundamental idea is to move the analysis stage in advance and insert the computation into the two-phase I/O, such that the data in the first I/O phase can be computed in place and the second shuffle phase is minimized with a reduce operation. We motivate this idea by profiling the I/O and CPU usage. With both theoretical analysis and evaluation on real application and benchmarks, we show that the collective computing can achieve 2.5X speedup and is promising in big scientific data analysis.

Original languageEnglish
Title of host publicationProceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages129-137
Number of pages9
ISBN (Electronic)9781467375894
DOIs
StatePublished - Dec 8 2015
Event44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015 - Beijing, China
Duration: Sep 1 2015Sep 4 2015

Publication series

NameProceedings of the International Conference on Parallel Processing Workshops
Volume2015-January
ISSN (Print)1530-2016

Conference

Conference44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015
CountryChina
CityBeijing
Period09/1/1509/4/15

    Fingerprint

Keywords

  • Big data
  • Collective computing
  • Map reduce

Cite this

Liu, J., Chen, Y., & Byna, S. (2015). Collective Computing for Scientific Big Data Analysis. In Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015 (pp. 129-137). [7349904] (Proceedings of the International Conference on Parallel Processing Workshops; Vol. 2015-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPPW.2015.22