Label noise correction methods

Bryce Nicholson, Jing Zhang, Victor S. Sheng, Zhiheng Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Scopus citations

Abstract

The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.

Original languageEnglish
Title of host publicationProceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
EditorsGabriella Pasi, James Kwok, Osmar Zaiane, Patrick Gallinari, Eric Gaussier, Longbing Cao
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467382731
DOIs
StatePublished - Dec 2 2015
EventIEEE International Conference on Data Science and Advanced Analytics, DSAA 2015 - Paris, France
Duration: Oct 19 2015Oct 21 2015

Publication series

NameProceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015

Conference

ConferenceIEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
Country/TerritoryFrance
CityParis
Period10/19/1510/21/15

Fingerprint

Dive into the research topics of 'Label noise correction methods'. Together they form a unique fingerprint.

Cite this