A generic framework for testing parallel file systems

Jinrui Cao, Simeng Wang, Dong Dai, Mai Zheng, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Large-scale parallel file systems are of prime importance today. However, despite of the importance, their failure-recovery capability is much less studied compared with local storage systems. Recent studies on local storage systems have exposed various vulnerabilities that could lead to data loss under failure events, which raise the concern for parallel file systems built on top of them.This paper proposes a generic framework for testing the failure handling of large-scale parallel file systems. The framework captures all disk I/O commands on all storage nodes of the target system to emulate realistic failure states, and checks if the target system can recover to a consistent state without incurring data loss. We have built a prototype for the Lustre file system. Our preliminary results show that the framework is able to uncover the internal I/O behavior of Lustre under different workloads and failure conditions, which provides a solid foundation for further analyzing the failure recovery of parallel file systems.

Original languageEnglish
Title of host publicationProceedings of PDSW-DISCS 2016
Subtitle of host publication1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages49-54
Number of pages6
ISBN (Electronic)9781509052165
DOIs
StatePublished - Jan 30 2017
Event1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, PDSW-DISCS 2016 - Salt Lake City, United States
Duration: Nov 14 2016 → …

Publication series

NameProceedings of PDSW-DISCS 2016: 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, PDSW-DISCS 2016
Country/TerritoryUnited States
CitySalt Lake City
Period11/14/16 → …

Fingerprint

Dive into the research topics of 'A generic framework for testing parallel file systems'. Together they form a unique fingerprint.

Cite this