DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.

Original languageEnglish
Title of host publicationProceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781450359863
DOIs
StatePublished - Nov 1 2018
Event27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018 - Limassol, Cyprus
Duration: Nov 1 2018Nov 4 2018

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)1089-795X

Conference

Conference27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
Country/TerritoryCyprus
CityLimassol
Period11/1/1811/4/18

Keywords

  • Distributed affix search
  • Distributed inverted index
  • Distributed search

Fingerprint

Dive into the research topics of 'DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems'. Together they form a unique fingerprint.

Cite this