Majority Voting and Pairing with Multiple Noisy Labeling

Victor S. Sheng; Jing Zhang; Bin Gu; Xindong Wu

doi:10.1109/TKDE.2017.2659740

Majority Voting and Pairing with Multiple Noisy Labeling

Victor S. Sheng, Jing Zhang, Bin Gu, Xindong Wu

Computer Science

Research output: Contribution to journal › Article › peer-review

43 Scopus citations

Abstract

With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.

Original language	English
Article number	7835129
Pages (from-to)	1355-1368
Number of pages	14
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	31
Issue number	7
DOIs	https://doi.org/10.1109/TKDE.2017.2659740
State	Published - Jul 1 2019

Keywords

Crowdsourcing
classification
data preprocessing
multiple noisy labels

Access to Document

10.1109/TKDE.2017.2659740

Cite this

@article{598fddb38efa4d1bad249f4c3b0f21e6,

title = "Majority Voting and Pairing with Multiple Noisy Labeling",

abstract = "With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.",

keywords = "Crowdsourcing, classification, data preprocessing, multiple noisy labels",

author = "Sheng, {Victor S.} and Jing Zhang and Bin Gu and Xindong Wu",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2019",

month = jul,

day = "1",

doi = "10.1109/TKDE.2017.2659740",

language = "English",

volume = "31",

pages = "1355--1368",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

number = "7",

}

TY - JOUR

T1 - Majority Voting and Pairing with Multiple Noisy Labeling

AU - Sheng, Victor S.

AU - Zhang, Jing

AU - Gu, Bin

AU - Wu, Xindong

PY - 2019/7/1

Y1 - 2019/7/1

N2 - With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.

AB - With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.

KW - Crowdsourcing

KW - classification

KW - data preprocessing

KW - multiple noisy labels

UR - http://www.scopus.com/inward/record.url?scp=85044040579&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2017.2659740

DO - 10.1109/TKDE.2017.2659740

M3 - Article

AN - SCOPUS:85044040579

SN - 1041-4347

VL - 31

SP - 1355

EP - 1368

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 7

M1 - 7835129

ER -

Majority Voting and Pairing with Multiple Noisy Labeling

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this