TY - GEN
T1 - Data deduplication in a hybrid architecture for improving write performance
AU - Chen, Chao
AU - Bastnagel, Jonathan
AU - Chen, Yong
PY - 2013
Y1 - 2013
N2 - Big Data computing provides a promising new opportunity for scientific discoveries and innovations. However, it also poses a significant challenge to the high-end computing community. An effective I/O solution is urgently required to support big data applications run on high-end computing systems. In this study, we propose a new approach namely DDiHA, Data Deduplication in Hybrid Architecture, to improve the write performance for write-intensive big data applications. The DDiHA approach utilizes data deduplications to reduce the size of data volumes before they are transfered and written to the storage. A hybrid architecture is introduced to facilitate data deduplications. Both theoretical study and prototyping verification were conducted to evaluate the DDiHA approach. The initial results have shown that, given the same compute resources, the DDiHA system outperformed the conventional architecture, even though it introduces additional computation workload from data deduplications. The DDiHA approach reduces the data size transferred across the network and improves the I/O system performance. It has a promising potential for write-intensive big data applications.
AB - Big Data computing provides a promising new opportunity for scientific discoveries and innovations. However, it also poses a significant challenge to the high-end computing community. An effective I/O solution is urgently required to support big data applications run on high-end computing systems. In this study, we propose a new approach namely DDiHA, Data Deduplication in Hybrid Architecture, to improve the write performance for write-intensive big data applications. The DDiHA approach utilizes data deduplications to reduce the size of data volumes before they are transfered and written to the storage. A hybrid architecture is introduced to facilitate data deduplications. Both theoretical study and prototyping verification were conducted to evaluate the DDiHA approach. The initial results have shown that, given the same compute resources, the DDiHA system outperformed the conventional architecture, even though it introduces additional computation workload from data deduplications. The DDiHA approach reduces the data size transferred across the network and improves the I/O system performance. It has a promising potential for write-intensive big data applications.
KW - big data
KW - data deduplication
KW - data-intensive computing
KW - high-end computing
KW - storage
UR - http://www.scopus.com/inward/record.url?scp=84879810196&partnerID=8YFLogxK
U2 - 10.1145/2481425.2481435
DO - 10.1145/2481425.2481435
M3 - Conference contribution
AN - SCOPUS:84879810196
SN - 9781450321464
T3 - Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013 - In Conjunction with ICS 2013
BT - Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013 - In Conjunction with ICS 2013
T2 - 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013 - In Conjunction with ICS 2013
Y2 - 10 June 2013 through 10 June 2013
ER -