TY - GEN
T1 - ShmStreaming
T2 - 27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
AU - Lai, Longbin
AU - Zhou, Jingyu
AU - Zheng, Long
AU - Li, Huakang
AU - Lu, Yanchao
AU - Tang, Feilong
AU - Guo, Minyi
PY - 2013
Y1 - 2013
N2 - The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.
AB - The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.
KW - Hadoop streaming
KW - Map-reduce
KW - Shared memory
UR - http://www.scopus.com/inward/record.url?scp=84881073312&partnerID=8YFLogxK
U2 - 10.1109/AINA.2013.90
DO - 10.1109/AINA.2013.90
M3 - Conference Proceeding
AN - SCOPUS:84881073312
SN - 9780769549538
T3 - Proceedings - International Conference on Advanced Information Networking and Applications, AINA
SP - 137
EP - 144
BT - Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
Y2 - 25 March 2013 through 28 March 2013
ER -