TY - GEN
T1 - OpenMP Memkind
T2 - 46th International Conference on Parallel Processing Workshops, ICPPW 2017
AU - Wang, Xi
AU - Leidel, John D.
AU - Chen, Yong
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/5
Y1 - 2017/9/5
N2 - Recently, CPU and graphics processors have been increasing the degree of on-chip parallelism in order to combat the decrease in traditional Moore's Law scaling. As a result, these new processors are increasing their appetite for faster memory devices with higher bandwidth. Component manufacturers have resorted to disparate or hierarchical fast memory device architectures such as shared local memory (SLM), scratch pad memory (SPM), and high bandwidth memory (HBM) to provide sufficient bandwidth. Following this trend, the physical memory locality gradually becomes a performance feature that users would like to explicitly manage. Inspired by this idea, this research is conducted to create a heterogeneous memory interface based on a new declarative data storage directive, or 'memkind', for the OpenMP parallel programming specification to explicitly manage physical memory locality. Our approach is implemented as an OpenMP directive in order to avoid allocating data inside parallel regions, thus avoiding performance degradation due to sequential operating system routines. We demonstrate our approach as an extension to the LLVM OpenMP implementation, that enables the portability of our approach to be rapidly ported to any LLVM-supported architecture target. Our contributions in this work are a detailed design analysis of the memkind directive as well as a detailed implementation in the LLVM compiler infrastructure. We demonstrate the efficacy of our approach using a synthetic benchmark application that records the execution performance and memory allocation efficiency.
AB - Recently, CPU and graphics processors have been increasing the degree of on-chip parallelism in order to combat the decrease in traditional Moore's Law scaling. As a result, these new processors are increasing their appetite for faster memory devices with higher bandwidth. Component manufacturers have resorted to disparate or hierarchical fast memory device architectures such as shared local memory (SLM), scratch pad memory (SPM), and high bandwidth memory (HBM) to provide sufficient bandwidth. Following this trend, the physical memory locality gradually becomes a performance feature that users would like to explicitly manage. Inspired by this idea, this research is conducted to create a heterogeneous memory interface based on a new declarative data storage directive, or 'memkind', for the OpenMP parallel programming specification to explicitly manage physical memory locality. Our approach is implemented as an OpenMP directive in order to avoid allocating data inside parallel regions, thus avoiding performance degradation due to sequential operating system routines. We demonstrate our approach as an extension to the LLVM OpenMP implementation, that enables the portability of our approach to be rapidly ported to any LLVM-supported architecture target. Our contributions in this work are a detailed design analysis of the memkind directive as well as a detailed implementation in the LLVM compiler infrastructure. We demonstrate the efficacy of our approach using a synthetic benchmark application that records the execution performance and memory allocation efficiency.
KW - Heterogeneous Memory
KW - LLVM
KW - Memkind
KW - OpenMP
UR - http://www.scopus.com/inward/record.url?scp=85030650483&partnerID=8YFLogxK
U2 - 10.1109/ICPPW.2017.40
DO - 10.1109/ICPPW.2017.40
M3 - Conference contribution
AN - SCOPUS:85030650483
T3 - Proceedings of the International Conference on Parallel Processing Workshops
SP - 220
EP - 227
BT - Proceedings - 46th International Conference on Parallel Processing Workshops, ICPPW 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 August 2017
ER -