TY - GEN
T1 - Real-time Sound Visualization via Multidimensional Clustering and Projections
AU - Le, Nhat
AU - Nguyen, Ngan V.T.
AU - Dang, Tommy
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6/29
Y1 - 2021/6/29
N2 - Sound plays a vital role in every aspect of human life since it is one of the primary sensory information that our auditory system collects and allows us to perceive the world. Sound clustering and visualization is the process of collecting and analyzing audio samples; that process is a prerequisite of sound classification, which is the core of automatic speech recognition, virtual assistants, and text to speech applications. Nevertheless, understanding how to recognize and properly interpret complex, high-dimensional audio data is the most significant challenge in sound clustering and visualization. This paper proposed a web-based platform to visualize and cluster similar sound samples of musical notes and human speech in real-time. For visualizing high-dimensional data like audio, Mel-Frequency Cepstral Coefficients (MFCCs) were initially developed to represent the sounds made by the human vocal tract are extracted. Then, t-distributed Stochastic Neighbor Embedding (t-SNE), a dimensionality reduction technique, was designed for high dimensional datasets is applied. This paper focuses on both data clustering and high-dimensional visualization methods to properly present the clustering results in the most meaningful way to uncover potentially interesting behavioral patterns of musical notes played by different instruments.
AB - Sound plays a vital role in every aspect of human life since it is one of the primary sensory information that our auditory system collects and allows us to perceive the world. Sound clustering and visualization is the process of collecting and analyzing audio samples; that process is a prerequisite of sound classification, which is the core of automatic speech recognition, virtual assistants, and text to speech applications. Nevertheless, understanding how to recognize and properly interpret complex, high-dimensional audio data is the most significant challenge in sound clustering and visualization. This paper proposed a web-based platform to visualize and cluster similar sound samples of musical notes and human speech in real-time. For visualizing high-dimensional data like audio, Mel-Frequency Cepstral Coefficients (MFCCs) were initially developed to represent the sounds made by the human vocal tract are extracted. Then, t-distributed Stochastic Neighbor Embedding (t-SNE), a dimensionality reduction technique, was designed for high dimensional datasets is applied. This paper focuses on both data clustering and high-dimensional visualization methods to properly present the clustering results in the most meaningful way to uncover potentially interesting behavioral patterns of musical notes played by different instruments.
KW - Human Speech Recognition
KW - Mel-Frequency Cepstral Coefficients
KW - Multivariate Clustering
KW - Principle Component Analysis
KW - Sound visualization
KW - t-distributed Stochastic Neighbor Embedding
UR - http://www.scopus.com/inward/record.url?scp=85112147857&partnerID=8YFLogxK
U2 - 10.1145/3468784.3471604
DO - 10.1145/3468784.3471604
M3 - Conference contribution
AN - SCOPUS:85112147857
T3 - ACM International Conference Proceeding Series
BT - IAIT 2021 - 12th International Conference on Advances in Information Technology
PB - Association for Computing Machinery
T2 - 12th International Conference on Advances in Information Technology: Intelligence and Innovation for Digital Business and Society, IAIT 2021
Y2 - 29 June 2021 through 1 July 2021
ER -