TY - GEN
T1 - DART
T2 - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
AU - Zhang, Wei
AU - Tang, Houjun
AU - Byna, Suren
AU - Chen, Yong
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.
AB - Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.
KW - Distributed affix search
KW - Distributed inverted index
KW - Distributed search
UR - http://www.scopus.com/inward/record.url?scp=85061546318&partnerID=8YFLogxK
U2 - 10.1145/3243176.3243207
DO - 10.1145/3243176.3243207
M3 - Conference contribution
AN - SCOPUS:85061546318
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
BT - Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 November 2018 through 4 November 2018
ER -