TY - JOUR
T1 - PRS
T2 - A Pattern-Directed Replication Scheme for Heterogeneous Object-Based Storage
AU - Zhou, Jiang
AU - Chen, Yong
AU - Xie, Wei
AU - Dai, Dong
AU - He, Shuibing
AU - Wang, Weiping
N1 - Funding Information:
Authors would like to thank to the anonymous reviewers for their valuable feedback. This research is supported in part by the National Science Foundation under Grants CNS-1338078, CNS-1362134, CCF-1409946, CCF-1718336, OAC-1835892, and CNS-1817094. This research is also supported in part by the Beijing Municipal Science and Technology Project under Grant Z191100007119002, by the National Science Foundation of China No. 61572377, the Natural Science Foundation of Hubei Province of China No.2017CFC889, and the Fundamental Research Funds for the Central Universities No. 2018QNA5015.
Publisher Copyright:
© 1968-2012 IEEE.
PY - 2020/4/1
Y1 - 2020/4/1
N2 - Data replication is a key technique to achieve high data availability, reliability, and optimized performance in distributed storage systems. In recent years, with emerged new storage devices, heterogeneous object-based storage systems, such as a storage system with a mix of hard disk drives, solid state drives, and other non-volatile memory devices have become increasingly attractive since they combine the merits of different storage devices to deliver better promises. However, existing data replication schemes do not well consider distinct characteristics of heterogeneous storage devices yet, which could lead to suboptimal performance. This article introduces a new data replication scheme called Pattern-directed Replication Scheme (PRS) to achieve efficient data replication for heterogeneous storage systems. Different from traditional schemes, the PRS selectively replicates data objects and distributes replicas to various storage devices based on their characteristics. It aggregates objects that have I/O correlation into object groups by calculating object distance and makes replication for grouped objects according to application's data access pattern identified. In addition, the PRS uses a pseudo random algorithm to optimize replica placement by considering the storage device performance and capacity features. We have evaluated the pattern-directed replication scheme with extensive tests in Sheepdog, a typical object-based storage system. The experimental results confirm that it is a highly efficient replication scheme for heterogeneous storage systems. For instance, the read performance was improved by 105 percent to nearly 10x compared with existing replication schemes.
AB - Data replication is a key technique to achieve high data availability, reliability, and optimized performance in distributed storage systems. In recent years, with emerged new storage devices, heterogeneous object-based storage systems, such as a storage system with a mix of hard disk drives, solid state drives, and other non-volatile memory devices have become increasingly attractive since they combine the merits of different storage devices to deliver better promises. However, existing data replication schemes do not well consider distinct characteristics of heterogeneous storage devices yet, which could lead to suboptimal performance. This article introduces a new data replication scheme called Pattern-directed Replication Scheme (PRS) to achieve efficient data replication for heterogeneous storage systems. Different from traditional schemes, the PRS selectively replicates data objects and distributes replicas to various storage devices based on their characteristics. It aggregates objects that have I/O correlation into object groups by calculating object distance and makes replication for grouped objects according to application's data access pattern identified. In addition, the PRS uses a pseudo random algorithm to optimize replica placement by considering the storage device performance and capacity features. We have evaluated the pattern-directed replication scheme with extensive tests in Sheepdog, a typical object-based storage system. The experimental results confirm that it is a highly efficient replication scheme for heterogeneous storage systems. For instance, the read performance was improved by 105 percent to nearly 10x compared with existing replication schemes.
KW - Data replication
KW - access pattern
KW - data distribution
KW - heterogeneous storage
KW - object-based storage
UR - http://www.scopus.com/inward/record.url?scp=85082169432&partnerID=8YFLogxK
U2 - 10.1109/TC.2019.2954089
DO - 10.1109/TC.2019.2954089
M3 - Article
AN - SCOPUS:85082169432
SN - 0018-9340
VL - 69
SP - 591
EP - 605
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 4
M1 - 8906026
ER -