Locality-driven high-level I/O aggregation for processing scientific datasets

Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Scientific I/O libraries, like PnetCDF, ADIOS, and HDF5, have been commonly used to facilitate the array-based scientific dataset processing. The underlying physical data layout information, however, is usually hidden from the upper layer's logical access. Such mismatching can lead to poor I/O. In this research, we have observed performance degradation in the case of concurrent sub-array accesses, where overlaps among calls that access sub-arrays led to high contention on storage servers due to the logical-physical mismatching. We propose a locality-driven high-level I/O aggregation approach to address these issues in this work. By designing a logical-physical mapping scheme, we try to utilize the scientific dataset's structured formats and the file systems' data distribution to resolve the mismatching issue. Therefore the I/O can be carried out in a locality-driven fashion. The proposed approach is effective and complements the existing I/O strategies, such as the independent I/O and collective I/O strategy. We have also carried out experimental tests and the results confirm the performance improvement compared to existing I/O strategies. The proposed locality-driven highlevel I/O aggregation approach holds a promise for efficiently processing scientific datasets, which is critical for the data intensive or big data computing era.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
Pages103-111
Number of pages9
DOIs
StatePublished - 2013
Event2013 IEEE International Conference on Big Data, Big Data 2013 - Santa Clara, CA, United States
Duration: Oct 6 2013Oct 9 2013

Publication series

NameProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013

Conference

Conference2013 IEEE International Conference on Big Data, Big Data 2013
Country/TerritoryUnited States
CitySanta Clara, CA
Period10/6/1310/9/13

Keywords

  • Big data
  • I/O aggregation
  • collective I/O
  • data intensive computing
  • high performance computing
  • scientific I/O library

Fingerprint

Dive into the research topics of 'Locality-driven high-level I/O aggregation for processing scientific datasets'. Together they form a unique fingerprint.

Cite this