Fast data analysis with integrated statistical metadata in scientific datasets

Jialin Liu, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scientific datasets, such as HDF5 and PnetCDF, have been used widely in many scientific applications. These data formats and libraries provide essential support for data analysis in scientific discovery and innovations. In this research, we present an approach to boost data analysis, namely Fast Analysis with Statistical Metadata (FASM), via data sub setting and integrating a small amount of statistics into datasets. We discuss how the FASM can improve data analysis performance. It is currently evaluated with the PnetCDF on synthetic and real data, but can also be implemented in other libraries. The FASM can potentially lead to a new dataset design and can have an impact on data analysis.

Original languageEnglish
Title of host publicationProceedings - 41st International Conference on Parallel Processing Workshops, ICPPW 2012
Pages602-603
Number of pages2
DOIs
StatePublished - Dec 20 2012
Event41st International Conference on Parallel Processing Workshops, ICPPW 2012 - Pittsburgh, PA, United States
Duration: Sep 10 2012Sep 13 2012

Publication series

NameProceedings of the International Conference on Parallel Processing Workshops
ISSN (Print)1530-2016

Conference

Conference41st International Conference on Parallel Processing Workshops, ICPPW 2012
CountryUnited States
CityPittsburgh, PA
Period09/10/1209/13/12

    Fingerprint

Keywords

  • FASM
  • big data
  • data intensive computing
  • high performance computing
  • statistical techniques

Cite this

Liu, J., & Chen, Y. (2012). Fast data analysis with integrated statistical metadata in scientific datasets. In Proceedings - 41st International Conference on Parallel Processing Workshops, ICPPW 2012 (pp. 602-603). [6337537] (Proceedings of the International Conference on Parallel Processing Workshops). https://doi.org/10.1109/ICPPW.2012.89