TY - JOUR
T1 - An optimal model with a lower bound of recall for imbalanced speech emotion recognition
AU - Ai, Xusheng
AU - Sheng, Victor S.
AU - Fang, Wei
AU - Ling, Charles X.
N1 - Funding Information:
This research was partially supported by the National Natural Science Foundation of China under grant No. 61472267, No. 61702351, the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No. 17KJB520036, Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant No. SZS201609, Suzhou Science and Technology Plan Project under grant No. SYG201903.
Funding Information:
This research was partially supported by the National Natural Science Foundation of China under grant No. 61472267, No. 61702351, the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No. 17KJB520036, Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant No. SZS201609, Suzhou Science and Technology Plan Project under grant No. SYG201903.
Funding Information:
This study was funded by Natural Science Foundation of China (grant number: 61472267, 61702351), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number: 17KJB520036), and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou (grant number: SZS201609), Suzhou Science and Technology Plan Project (grant number: SYG201903). Acknowledgements
Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/9/1
Y1 - 2020/9/1
N2 - In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.
AB - In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.
KW - Convolutional neural network
KW - Deep neural network
KW - Imbalance
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85086728950&partnerID=8YFLogxK
U2 - 10.1007/s11042-020-09155-3
DO - 10.1007/s11042-020-09155-3
M3 - Article
AN - SCOPUS:85086728950
SN - 1380-7501
VL - 79
SP - 24281
EP - 24301
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 33-34
ER -