Improving crowdsourced label quality using noise correction

Jing Zhang; Victor S. Sheng; Tao Li; Xindong Wu

doi:10.1109/TNNLS.2017.2677468

Improving crowdsourced label quality using noise correction

Jing Zhang, Victor S. Sheng, Tao Li, Xindong Wu

Computer Science

Research output: Contribution to journal › Article › peer-review

79 Scopus citations

Abstract

Crowdsourcing systems provide a cost effective and convenient way to collect labels, but they often fail to guarantee the quality of the labels. This paper proposes a novel framework that introduces noise correction techniques to further improve the quality of integrated labels that are inferred from the multiple noisy labels of objects. In the proposed general framework, information about the qualities of labelers estimated by a front-end ground truth inference algorithm is utilized to supervise subsequent label noise filtering and correction. The framework uses a novel algorithm termed adaptive voting noise correction (AVNC) to precisely identify and correct the potential noisy labels. After filtering out the instances with noisy labels, the remaining cleansed data set is used to create multiple weak classifiers, based on which a powerful ensemble classifier is induced to correct these noises. Experimental results on eight simulated data sets with different kinds of features and two real-world crowdsourcing data sets in different domains consistently show that: 1) the proposed framework can improve label quality regardless of inference algorithms, especially under the circumstance that each instance has a few repeated labels and 2) since the proposed AVNC algorithm considers both the number of and the probability of potential label noises, it outperforms the state-of-the-art noise correction algorithms.

Original language	English
Pages (from-to)	1675-1688
Number of pages	14
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	29
Issue number	5
DOIs	https://doi.org/10.1109/TNNLS.2017.2677468
State	Published - May 2018

Keywords

Crowdsourcing
Ground truth inference
Label integration
Label noise correction
Label quality

Access to Document

10.1109/TNNLS.2017.2677468

Cite this

@article{c6b37d26e85d4c478b256e2127f7bdf3,

title = "Improving crowdsourced label quality using noise correction",

abstract = "Crowdsourcing systems provide a cost effective and convenient way to collect labels, but they often fail to guarantee the quality of the labels. This paper proposes a novel framework that introduces noise correction techniques to further improve the quality of integrated labels that are inferred from the multiple noisy labels of objects. In the proposed general framework, information about the qualities of labelers estimated by a front-end ground truth inference algorithm is utilized to supervise subsequent label noise filtering and correction. The framework uses a novel algorithm termed adaptive voting noise correction (AVNC) to precisely identify and correct the potential noisy labels. After filtering out the instances with noisy labels, the remaining cleansed data set is used to create multiple weak classifiers, based on which a powerful ensemble classifier is induced to correct these noises. Experimental results on eight simulated data sets with different kinds of features and two real-world crowdsourcing data sets in different domains consistently show that: 1) the proposed framework can improve label quality regardless of inference algorithms, especially under the circumstance that each instance has a few repeated labels and 2) since the proposed AVNC algorithm considers both the number of and the probability of potential label noises, it outperforms the state-of-the-art noise correction algorithms.",

keywords = "Crowdsourcing, Ground truth inference, Label integration, Label noise correction, Label quality",

author = "Jing Zhang and Sheng, {Victor S.} and Tao Li and Xindong Wu",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2018",

month = may,

doi = "10.1109/TNNLS.2017.2677468",

language = "English",

volume = "29",

pages = "1675--1688",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

number = "5",

}

TY - JOUR

T1 - Improving crowdsourced label quality using noise correction

AU - Zhang, Jing

AU - Sheng, Victor S.

AU - Li, Tao

AU - Wu, Xindong

PY - 2018/5

Y1 - 2018/5

N2 - Crowdsourcing systems provide a cost effective and convenient way to collect labels, but they often fail to guarantee the quality of the labels. This paper proposes a novel framework that introduces noise correction techniques to further improve the quality of integrated labels that are inferred from the multiple noisy labels of objects. In the proposed general framework, information about the qualities of labelers estimated by a front-end ground truth inference algorithm is utilized to supervise subsequent label noise filtering and correction. The framework uses a novel algorithm termed adaptive voting noise correction (AVNC) to precisely identify and correct the potential noisy labels. After filtering out the instances with noisy labels, the remaining cleansed data set is used to create multiple weak classifiers, based on which a powerful ensemble classifier is induced to correct these noises. Experimental results on eight simulated data sets with different kinds of features and two real-world crowdsourcing data sets in different domains consistently show that: 1) the proposed framework can improve label quality regardless of inference algorithms, especially under the circumstance that each instance has a few repeated labels and 2) since the proposed AVNC algorithm considers both the number of and the probability of potential label noises, it outperforms the state-of-the-art noise correction algorithms.

AB - Crowdsourcing systems provide a cost effective and convenient way to collect labels, but they often fail to guarantee the quality of the labels. This paper proposes a novel framework that introduces noise correction techniques to further improve the quality of integrated labels that are inferred from the multiple noisy labels of objects. In the proposed general framework, information about the qualities of labelers estimated by a front-end ground truth inference algorithm is utilized to supervise subsequent label noise filtering and correction. The framework uses a novel algorithm termed adaptive voting noise correction (AVNC) to precisely identify and correct the potential noisy labels. After filtering out the instances with noisy labels, the remaining cleansed data set is used to create multiple weak classifiers, based on which a powerful ensemble classifier is induced to correct these noises. Experimental results on eight simulated data sets with different kinds of features and two real-world crowdsourcing data sets in different domains consistently show that: 1) the proposed framework can improve label quality regardless of inference algorithms, especially under the circumstance that each instance has a few repeated labels and 2) since the proposed AVNC algorithm considers both the number of and the probability of potential label noises, it outperforms the state-of-the-art noise correction algorithms.

KW - Crowdsourcing

KW - Ground truth inference

KW - Label integration

KW - Label noise correction

KW - Label quality

UR - http://www.scopus.com/inward/record.url?scp=85016090864&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2017.2677468

DO - 10.1109/TNNLS.2017.2677468

M3 - Article

C2 - 28333645

AN - SCOPUS:85016090864

SN - 2162-237X

VL - 29

SP - 1675

EP - 1688

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 5

ER -

Improving crowdsourced label quality using noise correction

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this