Pipelining computation and optimization strategies for scaling GROMACS on the sunway many-core processor

Yang Yu, Hong An, Junshi Chen, Weihao Liang, Qingqing Xu, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The increasing gap between plentiful computing elements and limited memory bandwidth makes it increasingly difficult and sometimes even infeasible for HPC community to port more applications onto many-core processor architectures. The Sunway many-core processor SW26010 used to build the Sunway TaihuLight System contains a total of 260 heterogeneous cores. All these cores can be divided into 4 core groups (CGs). Each CG includes a Management Processing Element (MPE) core and 64 Computing Processing Elements (CPEs) cores. In this paper, we refactor an important molecular dynamics (MD) application GROMACS on the Sunway Taihulight system. By rewriting the compute-intensive kernel of GROMACS, we exploit a suitable parallelism for CPE cluster and implement pipelining computation between MPE and CPE cluster. Optimization strategies including the efficient use of scratchpad, the software-emulated cache and a hybrid parallel algorithm are adopted to solve the challenging memory bandwidth limitation. When comparing the refactored version using MPE and 64 CPEs with the original ported version using only MPE, we achieve a 16x speedup for the compute-intensive kernel. For simulating a molecule with 3 million atoms, we currently have managed to scale to 798,720 cores. Moreover, we analyze the adaptability of our mapping and optimization strategies for solving the memory bandwidth limitation when refactoring a real-world application on the Sunway heterogeneous many-core processor system.

Original languageEnglish
Title of host publicationAlgorithms and Architectures for Parallel Processing - 17th International Conference, ICA3PP 2017, Proceedings
EditorsShadi Ibrahim, Zheng Yan, Kim-Kwang Raymond Choo, Witold Pedrycz
PublisherSpringer-Verlag
Pages18-32
Number of pages15
ISBN (Print)9783319654812
DOIs
StatePublished - 2017
Event17th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2017 - Helsinki, Finland
Duration: Aug 21 2017Aug 23 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10393 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2017
CountryFinland
CityHelsinki
Period08/21/1708/23/17

Keywords

  • Adaptability
  • Bandwidth competition
  • GROMACS
  • Parallel model
  • Performance optimization
  • Sunway TaihuLight system

Fingerprint Dive into the research topics of 'Pipelining computation and optimization strategies for scaling GROMACS on the sunway many-core processor'. Together they form a unique fingerprint.

Cite this