Communication Avoiding Power Scaling

John Leidel; Yong Chen

doi:10.1109/ICPPW.2015.26

Communication Avoiding Power Scaling

John Leidel, Yong Chen

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore's Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer's ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.

Original language	English
Title of host publication	Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	166-174
Number of pages	9
ISBN (Electronic)	9781467375894
DOIs	https://doi.org/10.1109/ICPPW.2015.26
State	Published - Dec 8 2015
Event	44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015 - Beijing, China Duration: Sep 1 2015 → Sep 4 2015

Publication series

Name	Proceedings of the International Conference on Parallel Processing Workshops
Volume	2015-January
ISSN (Print)	1530-2016

Conference

Conference	44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015
Country/Territory	China
City	Beijing
Period	09/1/15 → 09/4/15

Keywords

High performance computing
Multithreading
Parallel algorithms
Parallel programming
Performance analysis

Access to Document

10.1109/ICPPW.2015.26

Cite this

Leidel, J., & Chen, Y. (2015). Communication Avoiding Power Scaling. In Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015 (pp. 166-174). Article 7349908 (Proceedings of the International Conference on Parallel Processing Workshops; Vol. 2015-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPPW.2015.26

@inproceedings{f32b5428b26e43fc88ef25739a76367f,

title = "Communication Avoiding Power Scaling",

abstract = "Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore's Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer's ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.",

keywords = "High performance computing, Multithreading, Parallel algorithms, Parallel programming, Performance analysis",

author = "John Leidel and Yong Chen",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; 44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015 ; Conference date: 01-09-2015 Through 04-09-2015",

year = "2015",

month = dec,

day = "8",

doi = "10.1109/ICPPW.2015.26",

language = "English",

series = "Proceedings of the International Conference on Parallel Processing Workshops",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "166--174",

booktitle = "Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015",

}

Leidel, J & Chen, Y 2015, Communication Avoiding Power Scaling. in Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015., 7349908, Proceedings of the International Conference on Parallel Processing Workshops, vol. 2015-January, Institute of Electrical and Electronics Engineers Inc., pp. 166-174, 44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015, Beijing, China, 09/1/15. https://doi.org/10.1109/ICPPW.2015.26

Communication Avoiding Power Scaling. / Leidel, John; Chen, Yong.
Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 166-174 7349908 (Proceedings of the International Conference on Parallel Processing Workshops; Vol. 2015-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Communication Avoiding Power Scaling

AU - Leidel, John

AU - Chen, Yong

PY - 2015/12/8

Y1 - 2015/12/8

N2 - Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore's Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer's ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.

AB - Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore's Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer's ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.

KW - High performance computing

KW - Multithreading

KW - Parallel algorithms

KW - Parallel programming

KW - Performance analysis

UR - http://www.scopus.com/inward/record.url?scp=84954543665&partnerID=8YFLogxK

U2 - 10.1109/ICPPW.2015.26

DO - 10.1109/ICPPW.2015.26

M3 - Conference contribution

AN - SCOPUS:84954543665

T3 - Proceedings of the International Conference on Parallel Processing Workshops

SP - 166

EP - 174

BT - Proceedings - 2015 International Conference on Parallel Processing Workshops, The 44th Annual Conference, ICPPW 2015

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 44th Annual Conference of the International Conference on Parallel Processing Workshops, ICPPW 2015

Y2 - 1 September 2015 through 4 September 2015

ER -

Communication Avoiding Power Scaling

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this