CHAIO: Enabling HPC applications on data-intensive file systems

Hui Jin, Jiayu Ji, Xian He Sun, Yong Chen, Rajeev Thakur

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations


The computing paradigm of "HPC in the Cloud" has gained a surging interest in recent years, due to its merits of cost-efficiency, flexibility, and scalability. Cloud is designed on top of distributed file systems such as Google file system (GFS). The capability of running HPC applications on top of data-intensive file systems is a critical catalyst in promoting Clouds for HPC. However, the semantic gap between data-intensive file systems and HPC imposes numerous challenges. For example, N-1 (N to 1) is a widely used data access pattern for HPC applications such as check pointing, but cannot perform well on data-intensive file systems. In this study, we propose the CHunk-Aware I/O (CHAIO) strategy to enable efficient N-1 data access on data-intensive distributed file systems. CHAIO reorganizes I/O requests to favor data-intensive file systems and avoid possible access contention. It balances the workload distribution and promotes data locality. We have tested the CHAIO design over the Kosmos file system (KFS). Experimental results show that CHAIO achieves a more than two-fold improvement in I/O bandwidth for both write and read operations. Experiments in large-scale environment confirm the potential of CHAIO for small and irregular requests. The aggregator selection algorithm works well to balance the workload distribution. CHAIO is a critical and necessary step to enable HPC in the Cloud.

Original languageEnglish
Title of host publicationProceedings - 41st International Conference on Parallel Processing, ICPP 2012
Number of pages10
StatePublished - 2012
Event41st International Conference on Parallel Processing, ICPP 2012 - Pittsburgh, PA, United States
Duration: Sep 10 2012Sep 13 2012

Publication series

NameProceedings of the International Conference on Parallel Processing
ISSN (Print)0190-3918


Conference41st International Conference on Parallel Processing, ICPP 2012
Country/TerritoryUnited States
CityPittsburgh, PA


  • MapReduce
  • data-intensive
  • distributed file system
  • high-perfomrance computing


Dive into the research topics of 'CHAIO: Enabling HPC applications on data-intensive file systems'. Together they form a unique fingerprint.

Cite this