Facebook Instagram Twitter RSS Feed PodBean Back to top on side

FingerPrint Based Duplicate Detection in Streamed Data

In: Computing and Informatics, vol. 37, no. 6
A. Singh - S. Batra

Details:

Year, pages: 2019, 1313 - 1338
Language: eng
Keywords:
Duplicate detection, stable Bloom filter, d-left hashing, FingerPrint bits, streaming data
Document type: article
About article:
In computing, duplicate data detection refers to identifying duplicate copies of repeating data. Identifying duplicate data items in streamed data and eliminating them before storing, is a complex job. This paper proposes a novel data structure for duplicate detection using a variant of stable Bloom filter named as FingerPrint Stable Bloom Filter (FP-SBF). The proposed approach uses counting Bloom filter with fingerprint bits along with an optimization mechanism for duplicate detection. FP-SBF uses d-left hashing which reduces the computational time and decreases the false positives as well as false negatives. FP-SBF can process unbounded data in single pass, using k hash functions, and successfully differentiate between duplicate and distinct elements in O(k+1) time, independent of the size of incoming data. The performance of FP-SBF has been compared with various Bloom Filters used for stream data duplication detection and it has been theoretically and experimentally proved that the proposed approach efficiently detects the duplicates in streaming data with less memory requirements.
How to cite:
ISO 690:
Singh, A., Batra, S. 2019. FingerPrint Based Duplicate Detection in Streamed Data. In Computing and Informatics, vol. 37, no.6, pp. 1313-1338. 1335-9150. DOI: https://doi.org/10.4149/cai_2018_6_1313

APA:
Singh, A., Batra, S. (2019). FingerPrint Based Duplicate Detection in Streamed Data. Computing and Informatics, 37(6), 1313-1338. 1335-9150. DOI: https://doi.org/10.4149/cai_2018_6_1313
About edition:
Publisher: Ústav informatiky SAV
Published: 15. 2. 2019