SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems

Jiang Zhou; Wei Xie; Jason Noble; Kace Echo; Yong Chen

doi:10.1109/NAS.2016.7549423

SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems

Jiang Zhou, Wei Xie, Jason Noble, Kace Echo, Yong Chen

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

11 Scopus citations

Abstract

The data scale in many data centers is growing explosively with emerging applications and usages of big data technologies. Data distribution is a key issue in large-scale distributed storage systems to place petabytes of data or even beyond, among tens or hundreds of thousands of storage devices. In the meantime, heterogeneous storage systems, such as those having devices with hard disk drives (HDDs) and storage class memories (SCMs), have become increasingly popular for massive data storage due to balanced performance, capacity, and cost. Current data distribution algorithms can achieve efficient, scalable, and balanced mapping, but do not distinguish different characteristics of heterogeneous devices well. This paper presents a novel data distribution algorithm called SUORA (Scalable and Uniform storage via Optimally-adaptive and Random number Addressing), to take full advantage of heterogeneous devices. SUORA is a pseudo-random algorithm that uniformly distributes data cross a hybrid and tiered storage cluster. It divides heterogeneous devices, maps them onto different buckets and assigns them to various segments in each bucket. A pseudo-random and deterministic number sequence is generated to map data among segments and devices. Data movement is performed for achieving better read throughput while keeping load balance according to data hotness and bucket threshold. With considering distinct characteristics of heterogeneous storage devices well, the SUORA algorithm achieves a highly efficient adaptive data distribution for data centers and heterogeneous storage systems.

Original language	English
Title of host publication	2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781509033157
DOIs	https://doi.org/10.1109/NAS.2016.7549423
State	Published - Aug 23 2016
Event	11th IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Long Beach, United States Duration: Aug 8 2016 → Aug 10 2016

Publication series

Name	2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings

Conference

Conference	11th IEEE International Conference on Networking Architecture and Storage, NAS 2016
Country/Territory	United States
City	Long Beach
Period	08/8/16 → 08/10/16

Keywords

Data centers
Data distribution algorithm
Data management
Data placement
Heterogeneous storage

Access to Document

10.1109/NAS.2016.7549423

Cite this

Zhou, J., Xie, W., Noble, J., Echo, K., & Chen, Y. (2016). SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems. In 2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings Article 7549423 (2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/NAS.2016.7549423

Zhou, Jiang ; Xie, Wei ; Noble, Jason et al. / SUORA : A scalable and uniform data distribution algorithm for heterogeneous storage systems. 2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. (2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings).

@inproceedings{a9770742f2e44d4b978f09acad9b2910,

title = "SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems",

abstract = "The data scale in many data centers is growing explosively with emerging applications and usages of big data technologies. Data distribution is a key issue in large-scale distributed storage systems to place petabytes of data or even beyond, among tens or hundreds of thousands of storage devices. In the meantime, heterogeneous storage systems, such as those having devices with hard disk drives (HDDs) and storage class memories (SCMs), have become increasingly popular for massive data storage due to balanced performance, capacity, and cost. Current data distribution algorithms can achieve efficient, scalable, and balanced mapping, but do not distinguish different characteristics of heterogeneous devices well. This paper presents a novel data distribution algorithm called SUORA (Scalable and Uniform storage via Optimally-adaptive and Random number Addressing), to take full advantage of heterogeneous devices. SUORA is a pseudo-random algorithm that uniformly distributes data cross a hybrid and tiered storage cluster. It divides heterogeneous devices, maps them onto different buckets and assigns them to various segments in each bucket. A pseudo-random and deterministic number sequence is generated to map data among segments and devices. Data movement is performed for achieving better read throughput while keeping load balance according to data hotness and bucket threshold. With considering distinct characteristics of heterogeneous storage devices well, the SUORA algorithm achieves a highly efficient adaptive data distribution for data centers and heterogeneous storage systems.",

keywords = "Data centers, Data distribution algorithm, Data management, Data placement, Heterogeneous storage",

author = "Jiang Zhou and Wei Xie and Jason Noble and Kace Echo and Yong Chen",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 11th IEEE International Conference on Networking Architecture and Storage, NAS 2016 ; Conference date: 08-08-2016 Through 10-08-2016",

year = "2016",

month = aug,

day = "23",

doi = "10.1109/NAS.2016.7549423",

language = "English",

series = "2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings",

}

Zhou, J, Xie, W, Noble, J, Echo, K & Chen, Y 2016, SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems. in 2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings., 7549423, 2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 11th IEEE International Conference on Networking Architecture and Storage, NAS 2016, Long Beach, United States, 08/8/16. https://doi.org/10.1109/NAS.2016.7549423

SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems. / Zhou, Jiang; Xie, Wei; Noble, Jason et al.
2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. 7549423 (2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - SUORA

T2 - 11th IEEE International Conference on Networking Architecture and Storage, NAS 2016

AU - Zhou, Jiang

AU - Xie, Wei

AU - Noble, Jason

AU - Echo, Kace

AU - Chen, Yong

PY - 2016/8/23

Y1 - 2016/8/23

N2 - The data scale in many data centers is growing explosively with emerging applications and usages of big data technologies. Data distribution is a key issue in large-scale distributed storage systems to place petabytes of data or even beyond, among tens or hundreds of thousands of storage devices. In the meantime, heterogeneous storage systems, such as those having devices with hard disk drives (HDDs) and storage class memories (SCMs), have become increasingly popular for massive data storage due to balanced performance, capacity, and cost. Current data distribution algorithms can achieve efficient, scalable, and balanced mapping, but do not distinguish different characteristics of heterogeneous devices well. This paper presents a novel data distribution algorithm called SUORA (Scalable and Uniform storage via Optimally-adaptive and Random number Addressing), to take full advantage of heterogeneous devices. SUORA is a pseudo-random algorithm that uniformly distributes data cross a hybrid and tiered storage cluster. It divides heterogeneous devices, maps them onto different buckets and assigns them to various segments in each bucket. A pseudo-random and deterministic number sequence is generated to map data among segments and devices. Data movement is performed for achieving better read throughput while keeping load balance according to data hotness and bucket threshold. With considering distinct characteristics of heterogeneous storage devices well, the SUORA algorithm achieves a highly efficient adaptive data distribution for data centers and heterogeneous storage systems.

AB - The data scale in many data centers is growing explosively with emerging applications and usages of big data technologies. Data distribution is a key issue in large-scale distributed storage systems to place petabytes of data or even beyond, among tens or hundreds of thousands of storage devices. In the meantime, heterogeneous storage systems, such as those having devices with hard disk drives (HDDs) and storage class memories (SCMs), have become increasingly popular for massive data storage due to balanced performance, capacity, and cost. Current data distribution algorithms can achieve efficient, scalable, and balanced mapping, but do not distinguish different characteristics of heterogeneous devices well. This paper presents a novel data distribution algorithm called SUORA (Scalable and Uniform storage via Optimally-adaptive and Random number Addressing), to take full advantage of heterogeneous devices. SUORA is a pseudo-random algorithm that uniformly distributes data cross a hybrid and tiered storage cluster. It divides heterogeneous devices, maps them onto different buckets and assigns them to various segments in each bucket. A pseudo-random and deterministic number sequence is generated to map data among segments and devices. Data movement is performed for achieving better read throughput while keeping load balance according to data hotness and bucket threshold. With considering distinct characteristics of heterogeneous storage devices well, the SUORA algorithm achieves a highly efficient adaptive data distribution for data centers and heterogeneous storage systems.

KW - Data centers

KW - Data distribution algorithm

KW - Data management

KW - Data placement

KW - Heterogeneous storage

UR - http://www.scopus.com/inward/record.url?scp=84988369386&partnerID=8YFLogxK

U2 - 10.1109/NAS.2016.7549423

DO - 10.1109/NAS.2016.7549423

M3 - Conference contribution

AN - SCOPUS:84988369386

T3 - 2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings

BT - 2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 8 August 2016 through 10 August 2016

ER -

Zhou J, Xie W, Noble J, Echo K, Chen Y. SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems. In 2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. 7549423. (2016 IEEE International Conference on Networking Architecture and Storage, NAS 2016 - Proceedings). doi: 10.1109/NAS.2016.7549423

SUORA: A scalable and uniform data distribution algorithm for heterogeneous storage systems

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this