A cost-intelligent application-specific data layout scheme for parallel file systems

Huaiming Song, Yanlong Yin, Yong Chen, Xian He Sun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

44 Scopus citations

Abstract

I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.

Original languageEnglish
Title of host publicationHPDC'11 - Proceedings of the 20th International Symposium on High Performance Distributed Computing
Pages37-48
Number of pages12
DOIs
StatePublished - 2011
Event20th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC'11 - San Jose, CA, United States
Duration: Jun 8 2011Jun 11 2011

Publication series

NameProceedings of the IEEE International Symposium on High Performance Distributed Computing
ISSN (Print)1082-8907

Conference

Conference20th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC'11
CountryUnited States
CitySan Jose, CA
Period06/8/1106/11/11

Keywords

  • data layout
  • data-access performance modeling
  • data-intensive
  • parallel file systems

Fingerprint Dive into the research topics of 'A cost-intelligent application-specific data layout scheme for parallel file systems'. Together they form a unique fingerprint.

Cite this