Real-time Sound Visualization via Multidimensional Clustering and Projections

Nhat Le, Ngan V.T. Nguyen, Tommy Dang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Sound plays a vital role in every aspect of human life since it is one of the primary sensory information that our auditory system collects and allows us to perceive the world. Sound clustering and visualization is the process of collecting and analyzing audio samples; that process is a prerequisite of sound classification, which is the core of automatic speech recognition, virtual assistants, and text to speech applications. Nevertheless, understanding how to recognize and properly interpret complex, high-dimensional audio data is the most significant challenge in sound clustering and visualization. This paper proposed a web-based platform to visualize and cluster similar sound samples of musical notes and human speech in real-time. For visualizing high-dimensional data like audio, Mel-Frequency Cepstral Coefficients (MFCCs) were initially developed to represent the sounds made by the human vocal tract are extracted. Then, t-distributed Stochastic Neighbor Embedding (t-SNE), a dimensionality reduction technique, was designed for high dimensional datasets is applied. This paper focuses on both data clustering and high-dimensional visualization methods to properly present the clustering results in the most meaningful way to uncover potentially interesting behavioral patterns of musical notes played by different instruments.

Original languageEnglish
Title of host publicationIAIT 2021 - 12th International Conference on Advances in Information Technology
Subtitle of host publicationIntelligence and Innovation for Digital Business and Society
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450390125
DOIs
StatePublished - Jun 29 2021
Event12th International Conference on Advances in Information Technology: Intelligence and Innovation for Digital Business and Society, IAIT 2021 - Virtual, Online, Thailand
Duration: Jun 29 2021Jul 1 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference12th International Conference on Advances in Information Technology: Intelligence and Innovation for Digital Business and Society, IAIT 2021
Country/TerritoryThailand
CityVirtual, Online
Period06/29/2107/1/21

Keywords

  • Human Speech Recognition
  • Mel-Frequency Cepstral Coefficients
  • Multivariate Clustering
  • Principle Component Analysis
  • Sound visualization
  • t-distributed Stochastic Neighbor Embedding

Fingerprint

Dive into the research topics of 'Real-time Sound Visualization via Multidimensional Clustering and Projections'. Together they form a unique fingerprint.

Cite this