LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs

Ahmad O. Aseeri, Yu Zhuang, Mohammed Saeed Alkatheiri, Bipana Thapaliya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10% to 150%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2063-2070
Number of pages8
ISBN (Electronic)9781538650356
DOIs
StatePublished - Jan 22 2019
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
Country/TerritoryUnited States
CitySeattle
Period12/10/1812/13/18

Keywords

  • Big Data
  • Computation-I/O Overlapping
  • Machine Learning
  • Non-blocking I/O

Fingerprint

Dive into the research topics of 'LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs'. Together they form a unique fingerprint.

Cite this