Facebook Instagram Twitter RSS Feed PodBean Back to top on side

Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying

In: Computing and Informatics, vol. 35, no. 3
X. Zhang - G. Zhu - E. Wang - S. Fowler - Xiaoshe Dong

Details:

Year, pages: 2016, 586 - 614
Keywords:
Data de-duplication, self-adaptive, FastCDC
About article:
The data de-duplication system not only pursues the high de-duplication rate, which refers to the aggregate reduction in storage requirements gained from de-duplication, but also the de-duplication speed. To solve the problem of random parameter-setting brought by Content Defined Chunking (CDC), a self-adaptive data chunking algorithm is proposed. The algorithm improves the de-duplication rate by conducting pre-processing de-duplication to the samples of the classified files and then selecting the appropriate algorithm parameters. Meanwhile, FastCDC, a kind of content-based fast data chunking algorithm, is adopted to solve the problem of low de-duplication speed of CDC. By introducing de-duplication factor and acceleration factor, FastCDC can significantly boost de-duplication speed while not sacrificing the de-duplication rate through adjusting these two parameters. The experimental results demonstrate that our proposed method can improve the de-duplication rate by about 5 %, while FastCDC can obtain the increase of de-duplication speed by 50 % to 200 % only at the expense of less than 3 % de-duplication rate loss.
How to cite:
ISO 690:
Zhang, X., Zhu, G., Wang, E., Fowler, S., Dong, X. 2016. Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying. In Computing and Informatics, vol. 35, no.3, pp. 586-614. 1335-9150.

APA:
Zhang, X., Zhu, G., Wang, E., Fowler, S., Dong, X. (2016). Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying. Computing and Informatics, 35(3), 586-614. 1335-9150.