DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems

Wei Zhang; Houjun Tang; Suren Byna; Yong Chen

doi:10.1145/3243176.3243207

DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

7 Scopus citations

Abstract

Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.

Original language	English
Title of host publication	Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781450359863
DOIs	https://doi.org/10.1145/3243176.3243207
State	Published - Nov 1 2018
Event	27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018 - Limassol, Cyprus Duration: Nov 1 2018 → Nov 4 2018

Publication series

Name	Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)	1089-795X

Conference

Conference	27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
Country/Territory	Cyprus
City	Limassol
Period	11/1/18 → 11/4/18

Keywords

Distributed affix search
Distributed inverted index
Distributed search

Access to Document

10.1145/3243176.3243207

Cite this

Zhang, W., Tang, H., Byna, S., & Chen, Y. (2018). DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems. In Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018 Article a24 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3243176.3243207

Zhang, Wei ; Tang, Houjun ; Byna, Suren et al. / DART : Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems. Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018. Institute of Electrical and Electronics Engineers Inc., 2018. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

@inproceedings{a6304d58709e4586b640c54e85bd521b,

title = "DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems",

abstract = "Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the {"}full string hashing{"} use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the {"}initial hashing{"} use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.",

keywords = "Distributed affix search, Distributed inverted index, Distributed search",

author = "Wei Zhang and Houjun Tang and Suren Byna and Yong Chen",

note = "Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.; 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018 ; Conference date: 01-11-2018 Through 04-11-2018",

year = "2018",

month = nov,

day = "1",

doi = "10.1145/3243176.3243207",

language = "English",

series = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018",

}

Zhang, W, Tang, H, Byna, S & Chen, Y 2018, DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems. in Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018., a24, Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, Institute of Electrical and Electronics Engineers Inc., 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018, Limassol, Cyprus, 11/1/18. https://doi.org/10.1145/3243176.3243207

DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems. / Zhang, Wei; Tang, Houjun; Byna, Suren et al.
Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018. Institute of Electrical and Electronics Engineers Inc., 2018. a24 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - DART

T2 - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018

AU - Zhang, Wei

AU - Tang, Houjun

AU - Byna, Suren

AU - Chen, Yong

PY - 2018/11/1

Y1 - 2018/11/1

N2 - Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.

AB - Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.

KW - Distributed affix search

KW - Distributed inverted index

KW - Distributed search

UR - http://www.scopus.com/inward/record.url?scp=85061546318&partnerID=8YFLogxK

U2 - 10.1145/3243176.3243207

DO - 10.1145/3243176.3243207

M3 - Conference contribution

AN - SCOPUS:85061546318

T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

BT - Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 November 2018 through 4 November 2018

ER -

Zhang W, Tang H, Byna S, Chen Y. DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems. In Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018. Institute of Electrical and Electronics Engineers Inc. 2018. a24. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). doi: 10.1145/3243176.3243207

DART: Distributed adaptive radix tree for efficient affix-based keyword search on HPC systems

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this