TY - JOUR
T1 - An optimal model with a lower bound of recall for imbalanced speech emotion recognition
AU - Ai, Xusheng
AU - Sheng, Victor S.
AU - Fang, Wei
AU - Ling, Charles X.
N1 - Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/9/1
Y1 - 2020/9/1
N2 - In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.
AB - In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.
KW - Convolutional neural network
KW - Deep neural network
KW - Imbalance
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85086728950&partnerID=8YFLogxK
U2 - 10.1007/s11042-020-09155-3
DO - 10.1007/s11042-020-09155-3
M3 - Article
AN - SCOPUS:85086728950
SN - 1380-7501
VL - 79
SP - 24281
EP - 24301
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 33-34
ER -