TY - GEN
T1 - Compressed domain-specific data processing and analysis
AU - Dong, Dapeng
AU - Herbert, John
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Domain specific data such as sensor outputs and server trace logs have low levels of symbol richness, and so they can be represented in a very compact format. In this paper, we present a bit-oriented compression scheme designed not only to represent the data compactly but also to allow MapReduce programs to perform analysis and processing directly on the compressed data, and to do so in parallel. The core of the compression scheme is a novel hybrid data structure supporting bit pattern searching in constant time, and a scheme for making a block-splittable compressed file. Supporting software allows developers to work transparently with the compressed data. Experimental results demonstrate that the proposed compression scheme can significantly reduce data size and improve MapReduce analysis performance.
AB - Domain specific data such as sensor outputs and server trace logs have low levels of symbol richness, and so they can be represented in a very compact format. In this paper, we present a bit-oriented compression scheme designed not only to represent the data compactly but also to allow MapReduce programs to perform analysis and processing directly on the compressed data, and to do so in parallel. The core of the compression scheme is a novel hybrid data structure supporting bit pattern searching in constant time, and a scheme for making a block-splittable compressed file. Supporting software allows developers to work transparently with the compressed data. Experimental results demonstrate that the proposed compression scheme can significantly reduce data size and improve MapReduce analysis performance.
KW - Algorithm
KW - Big Data
KW - Compression
KW - MapReduce
UR - http://www.scopus.com/inward/record.url?scp=85047834946&partnerID=8YFLogxK
U2 - 10.1109/BigData.2017.8257941
DO - 10.1109/BigData.2017.8257941
M3 - Conference Proceeding
AN - SCOPUS:85047834946
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 325
EP - 330
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
Y2 - 11 December 2017 through 14 December 2017
ER -