TY - JOUR
T1 - Automated Performance Modeling of HPC Applications Using Machine Learning
AU - Sun, Jingwei
AU - Sun, Guangzhong
AU - Zhan, Shiyan
AU - Zhang, Jiepeng
AU - Chen, Yong
N1 - Publisher Copyright:
© 1968-2012 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/5/1
Y1 - 2020/5/1
N2 - Automated performance modeling and performance prediction of parallel programs are highly valuable in many use cases, such as in guiding task management and job scheduling, offering insights of application behaviors, and assisting resource requirement estimation. The performance of parallel programs is affected by numerous factors, including but not limited to hardware, applications, algorithms, and input parameters, thus an accurate performance prediction is often a challenging and daunting task. In this article, we focus on automatically predicting the execution time of parallel programs (more specifically, MPI programs) with different inputs, at different scales, and without domain knowledge. We model the correlation between the execution time and domain-independent runtime features. These features include values of variables, counters of branches, loops, and MPI communications. Through automatically instrumenting an MPI program, each execution of the program will output a feature vector and its corresponding execution time. After collecting data from executions with different inputs, a random forest machine learning approach is used to build an empirical performance model, which can predict the execution time of the program given a new input. A transfer learning method is used to reuse an existing performance model and improve the prediction accuracy on a new platform that lacks historical execution data. Our experiments and analyses of three parallel applications, Graph500, GalaxSee, and SMG2000, on three different systems confirm that our method performs well, with less than 20 percent prediction error on average.
AB - Automated performance modeling and performance prediction of parallel programs are highly valuable in many use cases, such as in guiding task management and job scheduling, offering insights of application behaviors, and assisting resource requirement estimation. The performance of parallel programs is affected by numerous factors, including but not limited to hardware, applications, algorithms, and input parameters, thus an accurate performance prediction is often a challenging and daunting task. In this article, we focus on automatically predicting the execution time of parallel programs (more specifically, MPI programs) with different inputs, at different scales, and without domain knowledge. We model the correlation between the execution time and domain-independent runtime features. These features include values of variables, counters of branches, loops, and MPI communications. Through automatically instrumenting an MPI program, each execution of the program will output a feature vector and its corresponding execution time. After collecting data from executions with different inputs, a random forest machine learning approach is used to build an empirical performance model, which can predict the execution time of the program given a new input. A transfer learning method is used to reuse an existing performance model and improve the prediction accuracy on a new platform that lacks historical execution data. Our experiments and analyses of three parallel applications, Graph500, GalaxSee, and SMG2000, on three different systems confirm that our method performs well, with less than 20 percent prediction error on average.
KW - Parallel computing
KW - machine learning
KW - model transferring
KW - performance modeling
UR - http://www.scopus.com/inward/record.url?scp=85083321915&partnerID=8YFLogxK
U2 - 10.1109/TC.2020.2964767
DO - 10.1109/TC.2020.2964767
M3 - Article
AN - SCOPUS:85083321915
VL - 69
SP - 749
EP - 763
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
SN - 0018-9340
IS - 5
M1 - 8956059
ER -