An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Xusheng Ai; Victor S. Sheng; Wei Fang; Charles X. Ling

doi:10.1007/s11042-020-09155-3

An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Xusheng Ai, Victor S. Sheng, Wei Fang, Charles X. Ling

Computer Science

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F₁ score. It is divided into three aspects: 1) A variant of F₁ score (TF₁ score) takes recall above a lower bound and F₁ score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF₁ score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F₁ score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.

Original language	English
Pages (from-to)	24281-24301
Number of pages	21
Journal	Multimedia Tools and Applications
Volume	79
Issue number	33-34
DOIs	https://doi.org/10.1007/s11042-020-09155-3
State	Published - Sep 1 2020

Keywords

Convolutional neural network
Deep neural network
Imbalance
Speech emotion recognition

Access to Document

10.1007/s11042-020-09155-3

Cite this

@article{c2c0bb7b6b59410895f5040d1aa4137d,

title = "An optimal model with a lower bound of recall for imbalanced speech emotion recognition",

abstract = "In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.",

keywords = "Convolutional neural network, Deep neural network, Imbalance, Speech emotion recognition",

author = "Xusheng Ai and Sheng, {Victor S.} and Wei Fang and Ling, {Charles X.}",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2020",

month = sep,

day = "1",

doi = "10.1007/s11042-020-09155-3",

language = "English",

volume = "79",

pages = "24281--24301",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

number = "33-34",

}

TY - JOUR

T1 - An optimal model with a lower bound of recall for imbalanced speech emotion recognition

AU - Ai, Xusheng

AU - Sheng, Victor S.

AU - Fang, Wei

AU - Ling, Charles X.

PY - 2020/9/1

Y1 - 2020/9/1

N2 - In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.

AB - In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.

KW - Convolutional neural network

KW - Deep neural network

KW - Imbalance

KW - Speech emotion recognition

UR - http://www.scopus.com/inward/record.url?scp=85086728950&partnerID=8YFLogxK

U2 - 10.1007/s11042-020-09155-3

DO - 10.1007/s11042-020-09155-3

M3 - Article

AN - SCOPUS:85086728950

SN - 1380-7501

VL - 79

SP - 24281

EP - 24301

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 33-34

ER -

An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this