ShmStreaming: A shared memory approach for improving Hadoop streaming performance

Longbin Lai; Jingyu Zhou; Long Zheng; Huakang Li; Yanchao Lu; Feilong Tang; Minyi Guo

doi:10.1109/AINA.2013.90

ShmStreaming: A shared memory approach for improving Hadoop streaming performance

Longbin Lai, Jingyu Zhou, Long Zheng, Huakang Li, Yanchao Lu, Feilong Tang, Minyi Guo

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

8 Citations (Scopus)

Abstract

The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.

Original language	English
Title of host publication	Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
Pages	137-144
Number of pages	8
DOIs	https://doi.org/10.1109/AINA.2013.90
Publication status	Published - 2013
Externally published	Yes
Event	27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013 - Barcelona, Spain Duration: 25 Mar 2013 → 28 Mar 2013

Publication series

Name	Proceedings - International Conference on Advanced Information Networking and Applications, AINA
ISSN (Print)	1550-445X

Conference

Conference	27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
Country/Territory	Spain
City	Barcelona
Period	25/03/13 → 28/03/13

Keywords

Hadoop streaming
Map-reduce
Shared memory

Access to Document

10.1109/AINA.2013.90

Cite this

Lai, L., Zhou, J., Zheng, L., Li, H., Lu, Y., Tang, F., & Guo, M. (2013). ShmStreaming: A shared memory approach for improving Hadoop streaming performance. In Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013 (pp. 137-144). Article 6531748 (Proceedings - International Conference on Advanced Information Networking and Applications, AINA). https://doi.org/10.1109/AINA.2013.90

@inproceedings{0a0c8918beb74c5f95ef8a425cbd358b,

title = "ShmStreaming: A shared memory approach for improving Hadoop streaming performance",

abstract = "The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.",

keywords = "Hadoop streaming, Map-reduce, Shared memory",

author = "Longbin Lai and Jingyu Zhou and Long Zheng and Huakang Li and Yanchao Lu and Feilong Tang and Minyi Guo",

year = "2013",

doi = "10.1109/AINA.2013.90",

language = "English",

isbn = "9780769549538",

series = "Proceedings - International Conference on Advanced Information Networking and Applications, AINA",

pages = "137--144",

booktitle = "Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013",

note = "27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013 ; Conference date: 25-03-2013 Through 28-03-2013",

}

Lai, L, Zhou, J, Zheng, L, Li, H, Lu, Y, Tang, F & Guo, M 2013, ShmStreaming: A shared memory approach for improving Hadoop streaming performance. in Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013., 6531748, Proceedings - International Conference on Advanced Information Networking and Applications, AINA, pp. 137-144, 27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013, Barcelona, Spain, 25/03/13. https://doi.org/10.1109/AINA.2013.90

ShmStreaming: A shared memory approach for improving Hadoop streaming performance. / Lai, Longbin; Zhou, Jingyu; Zheng, Long et al.
Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013. 2013. p. 137-144 6531748 (Proceedings - International Conference on Advanced Information Networking and Applications, AINA).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - ShmStreaming

T2 - 27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013

AU - Lai, Longbin

AU - Zhou, Jingyu

AU - Zheng, Long

AU - Li, Huakang

AU - Lu, Yanchao

AU - Tang, Feilong

AU - Guo, Minyi

PY - 2013

Y1 - 2013

N2 - The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.

AB - The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.

KW - Hadoop streaming

KW - Map-reduce

KW - Shared memory

UR - http://www.scopus.com/inward/record.url?scp=84881073312&partnerID=8YFLogxK

U2 - 10.1109/AINA.2013.90

DO - 10.1109/AINA.2013.90

M3 - Conference Proceeding

AN - SCOPUS:84881073312

SN - 9780769549538

T3 - Proceedings - International Conference on Advanced Information Networking and Applications, AINA

SP - 137

EP - 144

BT - Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013

Y2 - 25 March 2013 through 28 March 2013

ER -

Lai L, Zhou J, Zheng L, Li H, Lu Y, Tang F et al. ShmStreaming: A shared memory approach for improving Hadoop streaming performance. In Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013. 2013. p. 137-144. 6531748. (Proceedings - International Conference on Advanced Information Networking and Applications, AINA). doi: 10.1109/AINA.2013.90

ShmStreaming: A shared memory approach for improving Hadoop streaming performance

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this