Segmented In-Advance Data Analytics for Fast Scientific Discovery

Jialin Liu, Yong Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Scientific discovery usually involves data generation, data preprocessing, data storage and data analysis. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens during each cycle of the scientific discovery, continues to be the bottleneck in most scientific big data applications. A lot of research works have been conducted on reducing the data movement. Among the existing efforts and based on our previous research, reusing the analysis results shows a significant potential in optimizing the data movement between analysis operations. In this work, we propose the Segmented In-Advance (SIA) data analytics approach for optimizing the data movement and we also provide a cloud-based elastic distributed in-memory database to manage the intermediate analysis results. The fundamental idea of this Segmented In-Advance approach is to analyze the history operations and to predict the future interesting analytics operations. The predicted analysis operation is in-advance performed on the finer segmented dataset and the segmented results are distributed in an in-memory key-value store for future reuse. The evaluation shows that the segmented in-advance data analytics approach achieves 1.2X-6.1X speedup. The evaluation also shows a good scalability of the in-memory distributed data store. The proposed Segmented In-Advance data analytics approach is a promising data movement reduction solution for scientific big data applications and fast scientific discovery.

Original languageEnglish
Article number7431946
Pages (from-to)432-442
Number of pages11
JournalIEEE Transactions on Cloud Computing
Volume8
Issue number2
DOIs
StatePublished - Apr 1 2020

Keywords

  • Big data
  • Data intensive computing
  • Scientific computing
  • Segmented in-advance data analytics

Fingerprint Dive into the research topics of 'Segmented In-Advance Data Analytics for Fast Scientific Discovery'. Together they form a unique fingerprint.

Cite this