Label noise correction methods

Bryce Nicholson; Jing Zhang; Victor S. Sheng; Zhiheng Wang

doi:10.1109/DSAA.2015.7344791

Label noise correction methods

Bryce Nicholson, Jing Zhang, Victor S. Sheng, Zhiheng Wang

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

24 Scopus citations

Abstract

The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.

Original language	English
Title of host publication	Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
Editors	Gabriella Pasi, James Kwok, Osmar Zaiane, Patrick Gallinari, Eric Gaussier, Longbing Cao
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781467382731
DOIs	https://doi.org/10.1109/DSAA.2015.7344791
State	Published - Dec 2 2015
Event	IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015 - Paris, France Duration: Oct 19 2015 → Oct 21 2015

Publication series

Name	Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015

Conference

Conference	IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
Country/Territory	France
City	Paris
Period	10/19/15 → 10/21/15

Access to Document

10.1109/DSAA.2015.7344791

Cite this

Nicholson, B., Zhang, J., Sheng, V. S., & Wang, Z. (2015). Label noise correction methods. In G. Pasi, J. Kwok, O. Zaiane, P. Gallinari, E. Gaussier, & L. Cao (Eds.), Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015 Article 7344791 (Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DSAA.2015.7344791

Nicholson, Bryce ; Zhang, Jing ; Sheng, Victor S. et al. / Label noise correction methods. Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015. editor / Gabriella Pasi ; James Kwok ; Osmar Zaiane ; Patrick Gallinari ; Eric Gaussier ; Longbing Cao. Institute of Electrical and Electronics Engineers Inc., 2015. (Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015).

@inproceedings{e2401e8a8000419e9b70585826b6ff1e,

title = "Label noise correction methods",

abstract = "The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.",

author = "Bryce Nicholson and Jing Zhang and Sheng, {Victor S.} and Zhiheng Wang",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015 ; Conference date: 19-10-2015 Through 21-10-2015",

year = "2015",

month = dec,

day = "2",

doi = "10.1109/DSAA.2015.7344791",

language = "English",

series = "Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

editor = "Gabriella Pasi and James Kwok and Osmar Zaiane and Patrick Gallinari and Eric Gaussier and Longbing Cao",

booktitle = "Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015",

}

Nicholson, B, Zhang, J, Sheng, VS & Wang, Z 2015, Label noise correction methods. in G Pasi, J Kwok, O Zaiane, P Gallinari, E Gaussier & L Cao (eds), Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015., 7344791, Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Institute of Electrical and Electronics Engineers Inc., IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, 10/19/15. https://doi.org/10.1109/DSAA.2015.7344791

Label noise correction methods. / Nicholson, Bryce; Zhang, Jing; Sheng, Victor S. et al.
Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015. ed. / Gabriella Pasi; James Kwok; Osmar Zaiane; Patrick Gallinari; Eric Gaussier; Longbing Cao. Institute of Electrical and Electronics Engineers Inc., 2015. 7344791 (Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Label noise correction methods

AU - Nicholson, Bryce

AU - Zhang, Jing

AU - Sheng, Victor S.

AU - Wang, Zhiheng

PY - 2015/12/2

Y1 - 2015/12/2

N2 - The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.

AB - The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.

UR - http://www.scopus.com/inward/record.url?scp=84962869219&partnerID=8YFLogxK

U2 - 10.1109/DSAA.2015.7344791

DO - 10.1109/DSAA.2015.7344791

M3 - Conference contribution

AN - SCOPUS:84962869219

T3 - Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015

BT - Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015

A2 - Pasi, Gabriella

A2 - Kwok, James

A2 - Zaiane, Osmar

A2 - Gallinari, Patrick

A2 - Gaussier, Eric

A2 - Cao, Longbing

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015

Y2 - 19 October 2015 through 21 October 2015

ER -

Nicholson B, Zhang J, Sheng VS, Wang Z. Label noise correction methods. In Pasi G, Kwok J, Zaiane O, Gallinari P, Gaussier E, Cao L, editors, Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015. Institute of Electrical and Electronics Engineers Inc. 2015. 7344791. (Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015). doi: 10.1109/DSAA.2015.7344791

Label noise correction methods

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this