Vectorizing disks blocks for efficient storage system via deep learning

Dong Dai, Forrest Sheng Bao, Jiang Zhou, Xuanhua Shi, Yong Chen

Research output: Contribution to journalArticle

2 Scopus citations

Abstract

Efficient storage systems come from the intelligent management of the data units, i.e., disk blocks in local file system level. Block correlations represent the semantic patterns in storage systems. These correlations can be exploited for data caching, pre-fetching, layout optimization, I/O scheduling, etc. to finally realize an efficient storage system. In this paper, we introduce Block2Vec, a deep learning based strategy to mine the block correlations in storage systems. The core idea of Block2Vec is twofold. First, it proposes a new way to abstract blocks, which are considered as multi-dimensional vectors instead of traditional block Ids. In this way, we are able to capture similarity between blocks through the distances of their vectors. Second, based on vector representation of blocks, it further trains a deep neural network to learn the best vector assignment for each block. We leverage the recently advanced word embedding technique in natural language processing to efficiently train the neural network. To demonstrate the effectiveness of Block2Vec, we design a demonstrative block prediction algorithm based on mined correlations. Empirical comparison based on the simulation of real system traces shows that Block2Vec is capable of mining block-level correlations efficiently and accurately. This research and trial show that the deep learning strategy is a promising direction in optimizing storage system performance.

Original languageEnglish
Pages (from-to)75-90
Number of pages16
JournalParallel Computing
Volume82
DOIs
StatePublished - Feb 2019

    Fingerprint

Keywords

  • Block correlation
  • Deep learning
  • Intelligent storage
  • Natural language processing
  • Storage system
  • Word2Vec

Cite this