HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers

Tommy Dang, Ngan Nguyen, Yong Chen

Research output: Contribution to journalArticlepeer-review

Abstract

This paper presents HiperView, a visual analytics framework monitoring and characterizing the health status of high-performance computing systems through a RESTful interface in real time. The primary objectives of this visual analytical system are: (1) to provide a graphical interface for tracking the health status of a large number of data center hosts in real-time statistics, (2) to help users visually analyze unusual behavior of a series of events that may have temporal and spatial correlation, and (3) to assist in performing preliminary troubleshooting and maintenance with a visual layout that reflects the actual physical locations. Two use cases were analyzed in detail to assess the effectiveness of the HiperView on a medium-scale, Redfish-enabled production high-performance computing system with a total of 10 racks and 467 hosts. The visualization apparatus has been proven to offer the necessary support for system automation and control. Our framework’s visual components and interfaces are designed to potentially handle a larger-scale data center of thousands of hosts with hundreds of various health services per host.

Original languageEnglish
Pages (from-to)11807-11826
Number of pages20
JournalJournal of Supercomputing
Volume77
Issue number10
DOIs
StateAccepted/In press - 2021

Keywords

  • Baseboard Management Controller (BMC)
  • Boxplots
  • Data center
  • HPC visualization
  • Heatmap
  • High-performance computing
  • Multidimensional data visualization
  • Nagios Core
  • RESTful API
  • Radar charts
  • Redfish
  • Scatterplot
  • Time-series data analysis
  • Visual features

Fingerprint

Dive into the research topics of 'HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers'. Together they form a unique fingerprint.

Cite this