Efficient online stream deduplication for network block storage

Hongli Lu, Guangping Xu*, Bo Tang, Shengli Li, Mian Zhou

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Deduplication is an effective technique to optimize storage utilization in data centers and cloud storage systems. It splits data into chunks and then identifies whether chunks are unique or not. Fixed-size chunking (FSC) is widely used in deduplication, which defines the chunk boundary with a fixed interval of bytes. Although it is simple and efficient, FSC may cause boundary shift issue, which usually decreases deduplication rate. Content-defined chunking (CDC) has been proposed to solve this problem. However, there are two challenges to apply CDC in deduplication for network block storage. One challenge is how to establish a mapping scheme between the stream offsets of a deduplicated chunk and its block address; the other challenge is to design an efficient index structure to organize metadata of data chunks on the disk. In this paper, we design two structures to solve the mapping problem and implement two backends to store metadata on network block storage devices, which are based on B+ trees and hash table, respectively. In order to achieve a better search performance on the disk, we reduce the size of the hash table and shrink the lookup range. We evaluate our schemes by real-world workloads. The experimental results show that our schemes have an excellent search performance at an acceptable cost of spatial sacrifice.

Original languageEnglish
Title of host publicationProceedings - 16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11th IEEE International Conference on Social Computing and Networking and 8th IEEE International Conference on Sustainable Computing and Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018
EditorsJinjun Chen, Laurence T. Yang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages111-119
Number of pages9
ISBN (Electronic)9781728111414
DOIs
Publication statusPublished - 2 Jul 2018
Externally publishedYes
Event16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11th IEEE International Conference on Social Computing and Networking and 8th IEEE International Conference on Sustainable Computing and Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018 - Melbourne, Australia
Duration: 11 Dec 201813 Dec 2018

Publication series

NameProceedings - 16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11th IEEE International Conference on Social Computing and Networking and 8th IEEE International Conference on Sustainable Computing and Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018

Conference

Conference16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11th IEEE International Conference on Social Computing and Networking and 8th IEEE International Conference on Sustainable Computing and Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018
Country/TerritoryAustralia
CityMelbourne
Period11/12/1813/12/18

Keywords

  • B+ trees
  • CDC
  • Deduplication
  • Hash map
  • Metadata management

Fingerprint

Dive into the research topics of 'Efficient online stream deduplication for network block storage'. Together they form a unique fingerprint.

Cite this