TY - GEN
T1 - Label noise correction methods
AU - Nicholson, Bryce
AU - Zhang, Jing
AU - Sheng, Victor S.
AU - Wang, Zhiheng
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/2
Y1 - 2015/12/2
N2 - The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.
AB - The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.
UR - http://www.scopus.com/inward/record.url?scp=84962869219&partnerID=8YFLogxK
U2 - 10.1109/DSAA.2015.7344791
DO - 10.1109/DSAA.2015.7344791
M3 - Conference contribution
AN - SCOPUS:84962869219
T3 - Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
BT - Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
A2 - Pasi, Gabriella
A2 - Kwok, James
A2 - Zaiane, Osmar
A2 - Gallinari, Patrick
A2 - Gaussier, Eric
A2 - Cao, Longbing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
Y2 - 19 October 2015 through 21 October 2015
ER -