GraphTrek: Asynchronous graph traversal for property graph-based metadata management

Dong Dai, Philip Carns, Robert B. Ross, John Jenkins, Kyle Blauer, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Property graphs are a promising data model for rich metadata management in high-performance computing (HPC) systems because of their ability to represent not only metadata attributes but also the relationships between them. A property graph can be used to record the relationships between users, jobs, and data, for example, with unique annotations for each entity. This high-volume, power-law distributed use case is a natural fit for an out-of-core distributed property graph database. Such a system must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenance data mining). Large-scale property graph traversals are particularly challenging for distributed graph databases, however. Most existing graph databases implement a 'level-synchronous' breadth-first search algorithm that relies on global synchronization in each traversal step. This traversal model performs well in many problem domains, but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers. We define stragglers as abnormally slow steps (or servers) in a graph traversal that lead to low overall throughput for synchronous traversal algorithms. The straggler problem can be mitigated by the use of asynchronous traversal algorithms. Asynchronous traversal has been successfully demonstrated in graph processing frameworks, but such systems require the graph to be loaded into a separate batch-processing framework. In this work, we propose GraphTrek, a general asynchronous graph traversal engine working with graph databases for processing rich metadata management in their native format. We also outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages284-293
Number of pages10
ISBN (Electronic)9781467365987
DOIs
StatePublished - Oct 26 2015
EventIEEE International Conference on Cluster Computing, CLUSTER 2015 - Chicago, United States
Duration: Sep 8 2015Sep 11 2015

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2015-October
ISSN (Print)1552-5244

Conference

ConferenceIEEE International Conference on Cluster Computing, CLUSTER 2015
Country/TerritoryUnited States
CityChicago
Period09/8/1509/11/15

Keywords

  • Graph Traversal
  • Parallel File Systems
  • Property Graph
  • Property Graph Databases
  • Rich Metadata Management

Fingerprint

Dive into the research topics of 'GraphTrek: Asynchronous graph traversal for property graph-based metadata management'. Together they form a unique fingerprint.

Cite this