HMC-Sim-2.0: A co-design infrastructure for exploring custom memory cube operations

John D. Leidel, Yong Chen

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


The recent advent of stacked memory devices has led to a resurgence of research associated with the fundamental memory hierarchy and associated memory pipeline. The bandwidth advantages provided by stacked logic and DRAM devices have inspired research associated with eliminating the bandwidth bottlenecks associated with many applications in high performance computing. These augmented memory subsystems stand to change the landscape of high performance computing algorithm optimization. In addition to the two aforementioned focus areas, a third area of research is emerging to explore augmenting the stacked memory logic layer with additional operations. This first generation of Hybrid Memory Cube (HMC) devices provided rudimentary atomic memory operations. The Gen2 Hybrid Memory Cube devices provide more expressive atomic memory operations that include primitive integer arithmetic operations. Despite the inclusion of more expressive arithmetic operations, many users have expressed interest in more complex and potentially orthogonal custom memory cube, or CMC, operations in future revisions of the Hybrid Memory Cube specification. This work presents recent development associated with the HMC-Sim Hybrid Memory Cube simulation framework that provides users a powerful infrastructure to experiment and research augmented custom memory cube, or CMC, operations within the current Gen2 Hybrid Memory Cube device infrastructure. The goal of this approach is to provide computer architects the ability to experiment with augmentations to future memory devices in the scope of co-designing the future of scalable high performance computing instruments. We provide an overview of extending the original HMC-Sim simulation infrastructure to include support for CMC operations with requiring users to modify the core simulation code base. In addition, we provide a sample series of CMC operations that implement near-memory mutexes and demonstrate their efficacy using central locking and barrier synchronization algorithms traditionally found in parallel programming models and runtime libraries.

Original languageEnglish
Pages (from-to)77-88
Number of pages12
JournalParallel Computing
StatePublished - Oct 2017


  • 3D stacking
  • Barrier synchronization
  • Hybrid memory cube
  • Simulation


Dive into the research topics of 'HMC-Sim-2.0: A co-design infrastructure for exploring custom memory cube operations'. Together they form a unique fingerprint.

Cite this