Noise filtering to improve data and model quality for crowdsourcing

Chaoqun Li; Victor S. Sheng; Liangxiao Jiang; Hongwei Li

doi:10.1016/j.knosys.2016.06.003

Noise filtering to improve data and model quality for crowdsourcing

Chaoqun Li, Victor S. Sheng, Liangxiao Jiang, Hongwei Li

Computer Science

Research output: Contribution to journal › Article › peer-review

46 Scopus citations

Abstract

Crowdsourcing services provide an easy means of acquiring labeled training data for supervised learning. However, the labels provided by a single crowd worker are often unreliable. Repeated labeling can be used to solve this problem. After multiple labels have been acquired by repeated labeling for each instance, in general consensus methods are used to obtain the integrated labels of instances. Although consensus methods are effective in practice, it cannot be denied that a level of noise still exists in the set of integrated labels. In this study, an attempt was made to employ noise filters to delete the noise in integrated labels, and consequently, enhance the training data and model quality. In fact, noise handling is a relatively mature field in the machine learning community, and many noise filters for deleting label noise have been presented in the past. However, to the best of our knowledge, in very few studies was noise filtering used to improve crowdsourcing learning. Therefore, in this study we empirically investigated the performance of noise filters in terms of improving crowdsourcing learning. Thus, in this paper some existing noise filters presented in previous papers are reviewed and their experimental application to crowdsourcing learning tasks is described. Experimental results based on 14 benchmark UCI data sets and three real-world data sets show that these noise filters can significantly reduce the noise level in integrated labels and thereby considerably enhance the performance of target classifiers.

Original language	English
Pages (from-to)	96-103
Number of pages	8
Journal	Knowledge-Based Systems
Volume	107
DOIs	https://doi.org/10.1016/j.knosys.2016.06.003
State	Published - Sep 1 2016

Keywords

Crowdsourcing learning
Integrated labels
Label noise
Noise filtering

Access to Document

10.1016/j.knosys.2016.06.003

Cite this

@article{f4cb9494751b478aa774dca0aa1898a2,

title = "Noise filtering to improve data and model quality for crowdsourcing",

abstract = "Crowdsourcing services provide an easy means of acquiring labeled training data for supervised learning. However, the labels provided by a single crowd worker are often unreliable. Repeated labeling can be used to solve this problem. After multiple labels have been acquired by repeated labeling for each instance, in general consensus methods are used to obtain the integrated labels of instances. Although consensus methods are effective in practice, it cannot be denied that a level of noise still exists in the set of integrated labels. In this study, an attempt was made to employ noise filters to delete the noise in integrated labels, and consequently, enhance the training data and model quality. In fact, noise handling is a relatively mature field in the machine learning community, and many noise filters for deleting label noise have been presented in the past. However, to the best of our knowledge, in very few studies was noise filtering used to improve crowdsourcing learning. Therefore, in this study we empirically investigated the performance of noise filters in terms of improving crowdsourcing learning. Thus, in this paper some existing noise filters presented in previous papers are reviewed and their experimental application to crowdsourcing learning tasks is described. Experimental results based on 14 benchmark UCI data sets and three real-world data sets show that these noise filters can significantly reduce the noise level in integrated labels and thereby considerably enhance the performance of target classifiers.",

keywords = "Crowdsourcing learning, Integrated labels, Label noise, Noise filtering",

author = "Chaoqun Li and Sheng, {Victor S.} and Liangxiao Jiang and Hongwei Li",

note = "Publisher Copyright: {\textcopyright} 2016",

year = "2016",

month = sep,

day = "1",

doi = "10.1016/j.knosys.2016.06.003",

language = "English",

volume = "107",

pages = "96--103",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

}

TY - JOUR

T1 - Noise filtering to improve data and model quality for crowdsourcing

AU - Li, Chaoqun

AU - Sheng, Victor S.

AU - Jiang, Liangxiao

AU - Li, Hongwei

PY - 2016/9/1

Y1 - 2016/9/1

N2 - Crowdsourcing services provide an easy means of acquiring labeled training data for supervised learning. However, the labels provided by a single crowd worker are often unreliable. Repeated labeling can be used to solve this problem. After multiple labels have been acquired by repeated labeling for each instance, in general consensus methods are used to obtain the integrated labels of instances. Although consensus methods are effective in practice, it cannot be denied that a level of noise still exists in the set of integrated labels. In this study, an attempt was made to employ noise filters to delete the noise in integrated labels, and consequently, enhance the training data and model quality. In fact, noise handling is a relatively mature field in the machine learning community, and many noise filters for deleting label noise have been presented in the past. However, to the best of our knowledge, in very few studies was noise filtering used to improve crowdsourcing learning. Therefore, in this study we empirically investigated the performance of noise filters in terms of improving crowdsourcing learning. Thus, in this paper some existing noise filters presented in previous papers are reviewed and their experimental application to crowdsourcing learning tasks is described. Experimental results based on 14 benchmark UCI data sets and three real-world data sets show that these noise filters can significantly reduce the noise level in integrated labels and thereby considerably enhance the performance of target classifiers.

AB - Crowdsourcing services provide an easy means of acquiring labeled training data for supervised learning. However, the labels provided by a single crowd worker are often unreliable. Repeated labeling can be used to solve this problem. After multiple labels have been acquired by repeated labeling for each instance, in general consensus methods are used to obtain the integrated labels of instances. Although consensus methods are effective in practice, it cannot be denied that a level of noise still exists in the set of integrated labels. In this study, an attempt was made to employ noise filters to delete the noise in integrated labels, and consequently, enhance the training data and model quality. In fact, noise handling is a relatively mature field in the machine learning community, and many noise filters for deleting label noise have been presented in the past. However, to the best of our knowledge, in very few studies was noise filtering used to improve crowdsourcing learning. Therefore, in this study we empirically investigated the performance of noise filters in terms of improving crowdsourcing learning. Thus, in this paper some existing noise filters presented in previous papers are reviewed and their experimental application to crowdsourcing learning tasks is described. Experimental results based on 14 benchmark UCI data sets and three real-world data sets show that these noise filters can significantly reduce the noise level in integrated labels and thereby considerably enhance the performance of target classifiers.

KW - Crowdsourcing learning

KW - Integrated labels

KW - Label noise

KW - Noise filtering

UR - http://www.scopus.com/inward/record.url?scp=85000956253&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2016.06.003

DO - 10.1016/j.knosys.2016.06.003

M3 - Article

AN - SCOPUS:85000956253

SN - 0950-7051

VL - 107

SP - 96

EP - 103

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

ER -

Noise filtering to improve data and model quality for crowdsourcing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this