A hybrid machine-crowdsourcing approach for web table matching and cleaning

Chunhua Li, Pengpeng Zhao, Victor S. Sheng, Zhixu Li, Guanfeng Liu, Jian Wu, Zhiming Cui

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Table matching and data cleaning are two crucial activities in integrating data from different web tables, which have traditionally been considered as separate activities. We show that data cleaning can effectively help us discover table matches, and vice versa. In this paper, we study a hybrid machine-crowdsourcing approach to handle the two activities together with a well-developed knowledge base. Understanding the semantics of tables is fundamental to both matching and cleaning.We select the most valuable columns to crowdsourcing validation and infer others by consolidating crowdsourcing results and machine-generated results. When resolving inconsistency between data and semantics, relative trust is taken into account to validate data or semantics via crowd. Our experimental results show the effectiveness of the proposed approach for matching and cleaning web tables using real-life datasets.

Original languageEnglish
Title of host publicationWeb-Age Information Management - 17th International Conference, WAIM 2016, Proceedings
EditorsBin Cui, Xiang Lian, Dexi Liu, Nan Zhang, Jianliang Xu
PublisherSpringer-Verlag
Pages132-144
Number of pages13
ISBN (Print)9783319399577
DOIs
StatePublished - 2016
Event17th International Conference on Web-Age Information Management, WAIM 2016 - Nanchang, China
Duration: Jun 3 2016Jun 5 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9659
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Web-Age Information Management, WAIM 2016
CountryChina
CityNanchang
Period06/3/1606/5/16

Keywords

  • Crowdsourcing
  • Data cleaning
  • Table matching

Fingerprint Dive into the research topics of 'A hybrid machine-crowdsourcing approach for web table matching and cleaning'. Together they form a unique fingerprint.

Cite this