An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Xusheng Ai, Victor S. Sheng, Wei Fang, Charles X. Ling

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.

Original languageEnglish
Pages (from-to)24281-24301
Number of pages21
JournalMultimedia Tools and Applications
Volume79
Issue number33-34
DOIs
StatePublished - Sep 1 2020

Keywords

  • Convolutional neural network
  • Deep neural network
  • Imbalance
  • Speech emotion recognition

Fingerprint Dive into the research topics of 'An optimal model with a lower bound of recall for imbalanced speech emotion recognition'. Together they form a unique fingerprint.

Cite this