Compressed domain-specific data processing and analysis

Dapeng Dong, John Herbert

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Domain specific data such as sensor outputs and server trace logs have low levels of symbol richness, and so they can be represented in a very compact format. In this paper, we present a bit-oriented compression scheme designed not only to represent the data compactly but also to allow MapReduce programs to perform analysis and processing directly on the compressed data, and to do so in parallel. The core of the compression scheme is a novel hybrid data structure supporting bit pattern searching in constant time, and a scheme for making a block-splittable compressed file. Supporting software allows developers to work transparently with the compressed data. Experimental results demonstrate that the proposed compression scheme can significantly reduce data size and improve MapReduce analysis performance.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages325-330
Number of pages6
ISBN (Electronic)9781538627143
DOIs
Publication statusPublished - 1 Jul 2017
Externally publishedYes
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period11/12/1714/12/17

Keywords

  • Algorithm
  • Big Data
  • Compression
  • MapReduce

Fingerprint

Dive into the research topics of 'Compressed domain-specific data processing and analysis'. Together they form a unique fingerprint.

Cite this