Simple multiple noisy label utilization strategies

Victor S. Sheng

doi:10.1109/ICDM.2011.133

Simple multiple noisy label utilization strategies

Victor S. Sheng

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

22 Scopus citations

Abstract

With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pairwise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pairwise strategies can completely avoid the bias by having both sides (potential correct and incorrect/ noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.

Original language	English
Title of host publication	Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
Pages	635-644
Number of pages	10
DOIs	https://doi.org/10.1109/ICDM.2011.133
State	Published - 2011
Event	11th IEEE International Conference on Data Mining, ICDM 2011 - Vancouver, BC, Canada Duration: Dec 11 2011 → Dec 14 2011

Publication series

Name	Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)	1550-4786

Conference

Conference	11th IEEE International Conference on Data Mining, ICDM 2011
Country/Territory	Canada
City	Vancouver, BC
Period	12/11/11 → 12/14/11

Keywords

Classification
Crowdsourcing
Multiple noisy labels
Outsourcing

Access to Document

10.1109/ICDM.2011.133

Cite this

@inproceedings{b89ef07e2fb440c5b3842da7d6e16b8a,

title = "Simple multiple noisy label utilization strategies",

abstract = "With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pairwise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pairwise strategies can completely avoid the bias by having both sides (potential correct and incorrect/ noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.",

keywords = "Classification, Crowdsourcing, Multiple noisy labels, Outsourcing",

author = "Sheng, {Victor S.}",

year = "2011",

doi = "10.1109/ICDM.2011.133",

language = "English",

isbn = "9780769544083",

series = "Proceedings - IEEE International Conference on Data Mining, ICDM",

pages = "635--644",

booktitle = "Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011",

note = "11th IEEE International Conference on Data Mining, ICDM 2011 ; Conference date: 11-12-2011 Through 14-12-2011",

}

TY - GEN

T1 - Simple multiple noisy label utilization strategies

AU - Sheng, Victor S.

PY - 2011

Y1 - 2011

N2 - With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pairwise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pairwise strategies can completely avoid the bias by having both sides (potential correct and incorrect/ noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.

AB - With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pairwise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pairwise strategies can completely avoid the bias by having both sides (potential correct and incorrect/ noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.

KW - Classification

KW - Crowdsourcing

KW - Multiple noisy labels

KW - Outsourcing

UR - http://www.scopus.com/inward/record.url?scp=84857168380&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2011.133

DO - 10.1109/ICDM.2011.133

M3 - Conference contribution

AN - SCOPUS:84857168380

SN - 9780769544083

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 635

EP - 644

BT - Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011

T2 - 11th IEEE International Conference on Data Mining, ICDM 2011

Y2 - 11 December 2011 through 14 December 2011

ER -

Simple multiple noisy label utilization strategies

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this