Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model

Dong Dai, Yong Chen, Philip Carns, John Jenkins, Wei Zhang, Robert Ross

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


High-performance computing (HPC) systems generate huge amounts of metadata about different entities such as jobs, users, and files. Existing systems can efficiently record and manage part of these metadata, mainly the POSIX metadata of data files (e.g., file size, name, and permissions mode). But another important set of metadata, referred to as rich metadata in this study, which record not only wider range of entities (e.g., running processes and jobs) but also more complex relationships between them, are mostly missing in current HPC systems. Yet such rich metadata are critical for supporting many advanced data management functions such as identifying data sources and parameters behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. To uniformly and efficiently manage the rich metadata generated in HPC systems, We propose to utilize a graph model in this study. We identify the key challenges of implementing such a graph-based HPC rich metadata management system and present GraphMeta, a graph-based rich metadata management system designed and optimized for HPC platforms, to tackle these challenges. Extensive evaluations on both synthetic and real HPC metadata workloads show its advantages in both performance and scalability compared with existing solutions.

Original languageEnglish
Article number8580412
Pages (from-to)1613-1627
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Issue number7
StatePublished - Jul 1 2019


  • Data models
  • graph partitioning
  • high performance computing
  • metadata


Dive into the research topics of 'Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model'. Together they form a unique fingerprint.

Cite this