Data deduplication in a hybrid architecture for improving write performance

Chao Chen, Jonathan Bastnagel, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Big Data computing provides a promising new opportunity for scientific discoveries and innovations. However, it also poses a significant challenge to the high-end computing community. An effective I/O solution is urgently required to support big data applications run on high-end computing systems. In this study, we propose a new approach namely DDiHA, Data Deduplication in Hybrid Architecture, to improve the write performance for write-intensive big data applications. The DDiHA approach utilizes data deduplications to reduce the size of data volumes before they are transfered and written to the storage. A hybrid architecture is introduced to facilitate data deduplications. Both theoretical study and prototyping verification were conducted to evaluate the DDiHA approach. The initial results have shown that, given the same compute resources, the DDiHA system outperformed the conventional architecture, even though it introduces additional computation workload from data deduplications. The DDiHA approach reduces the data size transferred across the network and improves the I/O system performance. It has a promising potential for write-intensive big data applications.

Original languageEnglish
Title of host publicationProceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013 - In Conjunction with ICS 2013
DOIs
StatePublished - 2013
Event3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013 - In Conjunction with ICS 2013 - Eugene, OR, United States
Duration: Jun 10 2013Jun 10 2013

Publication series

NameProceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013 - In Conjunction with ICS 2013

Conference

Conference3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013 - In Conjunction with ICS 2013
CountryUnited States
CityEugene, OR
Period06/10/1306/10/13

Keywords

  • big data
  • data deduplication
  • data-intensive computing
  • high-end computing
  • storage

Fingerprint Dive into the research topics of 'Data deduplication in a hybrid architecture for improving write performance'. Together they form a unique fingerprint.

Cite this