Improving label accuracy by filtering low-quality workers in crowdsourcing

Bryce Nicholson, Victor S. Sheng, Jing Zhang, Zhiheng Wang, Xuefeng Xian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Filtering low-quality workers from data sets labeled via crowdsourcing is often necessary due to the presence of low quality workers, who either lack knowledge on corresponding subjects and thus contribute many incorrect labels to the data set, or intentionally label quickly and imprecisely in order to produce more labels in a short time period. We present two new filtering algorithms to remove low-quality workers, called Cluster Filtering (CF) and Dynamic Classification Filtering (DCF). Both methods can use any number of characteristics of workers as attributes for learning. CF separates workers using k-means clustering with 2 centroids, separating the workers into a high-quality cluster and a low-quality cluster. DCF uses a classifier of any kind to perform learning. It builds a model from a set of workers from other crowdsourced data sets and classifies the workers in the data set to filter. In theory, DCF can be trained to remove any proportion of the lowestquality workers. We compare the performance of DCF with two other filtering algorithms, one by Raykar and Yu (RY), and one by Ipeirotis et al. (IPW). Our results show that CF, the second-best filter, performs modestly but effectively, and that DCF, the best filter, performs much better than RY and IPW on average and on the majority of crowdsourced data sets.

Original languageEnglish
Title of host publicationAdvances in Artificial Intelligence and Soft Computing - 14th Mexican International Conference on Artificial Intelligence, MICAI 2015, Proceedings
EditorsGrigori Sidorov, SofÍa N. Galicia-Haro
PublisherSpringer-Verlag
Pages547-559
Number of pages13
ISBN (Print)9783319270593
DOIs
StatePublished - 2015
Event14th Mexican International Conference on Artificial Intelligence, MICAI 2015 - Cuernavaca, Morelos, Mexico
Duration: Oct 25 2015Oct 31 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9413
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th Mexican International Conference on Artificial Intelligence, MICAI 2015
Country/TerritoryMexico
CityCuernavaca, Morelos
Period10/25/1510/31/15

Fingerprint

Dive into the research topics of 'Improving label accuracy by filtering low-quality workers in crowdsourcing'. Together they form a unique fingerprint.

Cite this