In-advance data analytics for reducing time to discovery

Jialin Liu, Yin Lu, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Scientific workflow involves data generation, data analysis, and knowledge discovery. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens among data generation, data analysis, and knowledge discovery, becomes a bottleneck in most scientific big data applications. Our previous work shows that reusing the analysis results can have a significant potential in reducing the overlap between data movement among compute nodes and storage nodes. In this work, we propose a new in-advance data analytics method to augment the result reuse. The fundamental idea of this in-advance data analytics method and its prototyping system is to predict the potential useful analytics operations by studying the users' analysis pattern. The predicted analysis operation is pro-actively performed on existing data and the analysis results are stored in an in-memory database for result reuse. The evaluation shows that the in-advance data analytics method and its prototyping system gains 1.2X-6.1X speedup in I/O performance improvement with 50% data overlapping and 10%-100% operation recommendation hit rate. The proposed in-advance data analytics method brings a new promising data reduction solution for big data applications.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
EditorsWo Chang, Jun Huan, Nick Cercone, Saumyadipta Pyne, Vasant Honavar, Jimmy Lin, Xiaohua Tony Hu, Charu Aggarwal, Bamshad Mobasher, Jian Pei, Raghunath Nambiar
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages329-334
Number of pages6
ISBN (Electronic)9781479956654
DOIs
StatePublished - 2014
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington, United States
Duration: Oct 27 2014Oct 30 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Conference

Conference2nd IEEE International Conference on Big Data, IEEE Big Data 2014
CountryUnited States
CityWashington
Period10/27/1410/30/14

Keywords

  • Big data
  • Data intensive computing
  • In-advance data analytics
  • Scientific computing

Fingerprint Dive into the research topics of 'In-advance data analytics for reducing time to discovery'. Together they form a unique fingerprint.

Cite this