Facebook Instagram Twitter RSS Feed PodBean Back to top on side

Discovering Foreign Keys on Web Tables with the Crowd

In: Computing and Informatics, vol. 38, no. 3
X. Wu - N. Wang - H. Liu

Details:

Year, pages: 2020, 621 - 646
Language: eng
Keywords:
Foreign key, web tables, crowdsourcing, task selection, task reduction, semantic recovery
About article:
Foreign-key relationship is one of the most important constraints between two tables. Previous works focused on detecting inclusion dependencies (INDs) or foreign keys in relational database. To discover foreign-key relationship is obviously helpful for analyzing and integrating data in web tables. However, because of poor quality of web tables, it is difficult to discover foreign keys by existing techniques based on checking basic integrity constraints. In this paper, we propose a hybrid human-machine framework to detect foreign keys on web tables. After discovering candidates and evaluating their confidence of being true foreign keys by machine algorithm, we verify those candidates leveraging the power of the crowd. To reduce the monetary cost, a dynamical task selection technique based on conflict detection and inclusion dependency is proposed, which could eliminate redundant tasks and assign the most valuable tasks to workers. Additionally, to make workers complete tasks more effectively and efficiently, sampling strategy is applied to minimize the number of tuples posed to the crowd. We conducted extensive experiments on real-world datasets and results show that our framework can obviously improve foreign key detection accuracy on web tables with lower monetary cost and time cost.
How to cite:
ISO 690:
Wu, X., Wang, N., Liu, H. 2020. Discovering Foreign Keys on Web Tables with the Crowd. In Computing and Informatics, vol. 38, no.3, pp. 621-646. 1335-9150. DOI: https://doi.org/10.31577/cai_2019_3_621

APA:
Wu, X., Wang, N., Liu, H. (2020). Discovering Foreign Keys on Web Tables with the Crowd. Computing and Informatics, 38(3), 621-646. 1335-9150. DOI: https://doi.org/10.31577/cai_2019_3_621
About edition:
Publisher: Ústav informatiky SAV
Published: 9. 3. 2020