With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.
|Number of pages||14|
|Journal||IEEE Transactions on Knowledge and Data Engineering|
|State||Published - Jul 1 2019|
- data preprocessing
- multiple noisy labels