TY - JOUR
T1 - LoomIO
T2 - Object-Level Coordination in Distributed File Systems
AU - Hua, Yusheng
AU - Shi, Xuanhua
AU - He, Kang
AU - Jin, Hai
AU - Xie, Wei
AU - He, Ligang
AU - Chen, Yong
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2022/8/1
Y1 - 2022/8/1
N2 - Device-level interference is recognized as a major cause of the performance degradation in distributed file systems. Although the approaches of mitigating interference through coordination at application-level, middleware-level, and server-level have shown beneficial results in previous studies, we find their effectiveness is largely reduced since I/O requests are re-arranged by underlying object file systems. In this research study, we prove that object-level coordination is critical and often the key to address the interference issue, as the scheduling of object requests determines the device-level accesses and thus determines the actual I/O bandwidth and latency. This article proposes an object-level coordination system, LoomIO, which uses an OBOP (One-Broadcast-One-Propagate) method and a time-limited coordination process to deliver highly efficient coordination service. Specifically, LoomIO enables object requests to achieve an optimized scheduling decision within a few milliseconds and largely mitigates the device-level interference. We have implemented a LoomIO prototye and integrated it into Ceph file system. The evaluation results show that LoomIO achieved the considerable improvements in resource utilization (by up to 35%), in I/O throughput (by up to 31%), and in 99th percentile latency (by up to 54%) compared to the K-optimal method which uses the same scheduling algorithm as LoomIO but does not have the coordination support.
AB - Device-level interference is recognized as a major cause of the performance degradation in distributed file systems. Although the approaches of mitigating interference through coordination at application-level, middleware-level, and server-level have shown beneficial results in previous studies, we find their effectiveness is largely reduced since I/O requests are re-arranged by underlying object file systems. In this research study, we prove that object-level coordination is critical and often the key to address the interference issue, as the scheduling of object requests determines the device-level accesses and thus determines the actual I/O bandwidth and latency. This article proposes an object-level coordination system, LoomIO, which uses an OBOP (One-Broadcast-One-Propagate) method and a time-limited coordination process to deliver highly efficient coordination service. Specifically, LoomIO enables object requests to achieve an optimized scheduling decision within a few milliseconds and largely mitigates the device-level interference. We have implemented a LoomIO prototye and integrated it into Ceph file system. The evaluation results show that LoomIO achieved the considerable improvements in resource utilization (by up to 35%), in I/O throughput (by up to 31%), and in 99th percentile latency (by up to 54%) compared to the K-optimal method which uses the same scheduling algorithm as LoomIO but does not have the coordination support.
KW - Distributed object file system
KW - Erasure-coding
KW - I/O coordination
KW - Performance
UR - http://www.scopus.com/inward/record.url?scp=85121725248&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2021.3126260
DO - 10.1109/TPDS.2021.3126260
M3 - Article
AN - SCOPUS:85121725248
VL - 33
SP - 1799
EP - 1810
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
SN - 1045-9219
IS - 8
ER -