TY - JOUR
T1 - Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores
AU - Wang, Y.X.
AU - Zhang, L.
AU - Liu, W.
AU - Cheng, X.
AU - Zhuang, Yu
AU - Chronopoulos, A. T.
N1 - Funding Information:
We would like to thank NSCC-Guangzhou for providing access to the Tianhe-2 supercomputer as well as their technical guidance. This work was funded by the National Natural Science Foundation of China (NSFC) under grant no. 61379056 .
Funding Information:
We would like to thank NSCC-Guangzhou for providing access to the Tianhe-2 supercomputer as well as their technical guidance. This work was funded by the National Natural Science Foundation of China (NSFC) under grant no. 61379056.
Publisher Copyright:
© 2018 Elsevier Ltd
PY - 2018/9
Y1 - 2018/9
N2 - For computational fluid dynamics (CFD) applications with a large number of grid points/cells, parallel computing is a common efficient strategy to reduce the computational time. How to achieve the best performance in the modern supercomputer system, especially with heterogeneous computing resources such as hybrid CPU+GPU, or a CPU + Intel Xeon Phi (MIC) co-processors, is still a great challenge.An in-house parallel CFD code capable of simulating three dimensional structured grid applications is developed and tested in this study. Several methods of parallelization, performance optimization and code tuning both in the CPU-only homogeneous system and in the heterogeneous system are proposed based on identifying potential parallelism of applications, balancing the work load among all kinds of computing devices, tuning the multi-thread code toward better performance in intra-machine node with hundreds of CPU/MIC cores, and optimizing the communication among inter-nodes, inter-cores, and between CPUs and MICs.Some benchmark cases from model and/or industrial CFD applications are tested on the Tianhe-1A and Tianhe-2 supercomputer to evaluate the performance. Among these CFD cases, the maximum number of grid cells reached 780 billion. The tuned solver successfully scales up to half of the entire Tianhe-2 supercomputer system with over 1.376 million of heterogeneous cores. The test results and performance analysis are discussed in detail.
AB - For computational fluid dynamics (CFD) applications with a large number of grid points/cells, parallel computing is a common efficient strategy to reduce the computational time. How to achieve the best performance in the modern supercomputer system, especially with heterogeneous computing resources such as hybrid CPU+GPU, or a CPU + Intel Xeon Phi (MIC) co-processors, is still a great challenge.An in-house parallel CFD code capable of simulating three dimensional structured grid applications is developed and tested in this study. Several methods of parallelization, performance optimization and code tuning both in the CPU-only homogeneous system and in the heterogeneous system are proposed based on identifying potential parallelism of applications, balancing the work load among all kinds of computing devices, tuning the multi-thread code toward better performance in intra-machine node with hundreds of CPU/MIC cores, and optimizing the communication among inter-nodes, inter-cores, and between CPUs and MICs.Some benchmark cases from model and/or industrial CFD applications are tested on the Tianhe-1A and Tianhe-2 supercomputer to evaluate the performance. Among these CFD cases, the maximum number of grid cells reached 780 billion. The tuned solver successfully scales up to half of the entire Tianhe-2 supercomputer system with over 1.376 million of heterogeneous cores. The test results and performance analysis are discussed in detail.
M3 - Article
SP - 226
EP - 236
JO - Computers and Fluids
JF - Computers and Fluids
ER -