TY - GEN
T1 - Provenance-based object storage prediction scheme for scientific big data applications
AU - Dai, Dong
AU - Chen, Yong
AU - Kimpe, Dries
AU - Ross, Rob
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/1/7
Y1 - 2015/1/7
N2 - Object storage has been increasingly adopted in high-performance computing for scientific, big data applications. With object storage, applications usually use object IDs, queries, or collections to identify the data instead of using files. Since the object store changes the way data is accessed in applications, it introduces new challenges for I/O prediction, which used to work based on interfile or intrafile pattern detection. The key challenge is that the inputs of object-based applications are no longer expressed as static file names: they become much more dynamic and unstable, hidden inside application logic. Traditional prediction strategies do not work well in such conditions. In this paper, we introduce the use of provenance information, which was collected for data management in high-performance computing systems, in order to build an accurate coarse-grained (object-level) input prediction. The prediction results can be preloaded into a burst buffer to accelerate future reads. To our best knowledge, this study is the first to use provenance information in object stores to predict application inputs. Evaluation results confirm the effectiveness and accuracy of our provenance-based prediction and show that the proposed prediction system is feasible for real-work deployment.
AB - Object storage has been increasingly adopted in high-performance computing for scientific, big data applications. With object storage, applications usually use object IDs, queries, or collections to identify the data instead of using files. Since the object store changes the way data is accessed in applications, it introduces new challenges for I/O prediction, which used to work based on interfile or intrafile pattern detection. The key challenge is that the inputs of object-based applications are no longer expressed as static file names: they become much more dynamic and unstable, hidden inside application logic. Traditional prediction strategies do not work well in such conditions. In this paper, we introduce the use of provenance information, which was collected for data management in high-performance computing systems, in order to build an accurate coarse-grained (object-level) input prediction. The prediction results can be preloaded into a burst buffer to accelerate future reads. To our best knowledge, this study is the first to use provenance information in object stores to predict application inputs. Evaluation results confirm the effectiveness and accuracy of our provenance-based prediction and show that the proposed prediction system is feasible for real-work deployment.
UR - http://www.scopus.com/inward/record.url?scp=84921796877&partnerID=8YFLogxK
U2 - 10.1109/BigData.2014.7004242
DO - 10.1109/BigData.2014.7004242
M3 - Conference contribution
AN - SCOPUS:84921796877
T3 - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
SP - 271
EP - 280
BT - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
A2 - Chang, Wo
A2 - Huan, Jun
A2 - Cercone, Nick
A2 - Pyne, Saumyadipta
A2 - Honavar, Vasant
A2 - Lin, Jimmy
A2 - Hu, Xiaohua Tony
A2 - Aggarwal, Charu
A2 - Mobasher, Bamshad
A2 - Pei, Jian
A2 - Nambiar, Raghunath
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 October 2014 through 30 October 2014
ER -