TY - GEN
T1 - Communication Avoiding Power Scaling
AU - Leidel, John
AU - Chen, Yong
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/8
Y1 - 2015/12/8
N2 - Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore's Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer's ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.
AB - Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore's Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer's ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.
KW - High performance computing
KW - Multithreading
KW - Parallel algorithms
KW - Parallel programming
KW - Performance analysis
UR - http://www.scopus.com/inward/record.url?scp=84954543665&partnerID=8YFLogxK
U2 - 10.1109/ICPPW.2015.26
DO - 10.1109/ICPPW.2015.26
M3 - Conference contribution
AN - SCOPUS:84954543665
T3 - Proceedings of the International Conference on Parallel Processing Workshops
SP - 166
EP - 174
BT - Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015
Y2 - 1 September 2015 through 4 September 2015
ER -