TY - GEN
T1 - A novel parallel prefix sum algorithm and its implementation on multi-core platforms
AU - Zhang, Nan
PY - 2010
Y1 - 2010
N2 - This paper presents a novel parallel prefix sum algorithm on n numbers by p processors. A parallel time analysis shows that this proposed algorithm is more efficient and easy-scalable than Blelloch's classic parallel solution for the same problem. Unlike Blelloch's method, the proposed algorithm does not require the number p of processors to be a power of 2, and it needs no extra buffers apart from the array that holds the numbers. For practical considerations, a variant of the proposed algorithm is further developed which incurs less overhead due to thread management and synchronisation. The Blelloch's algorithm and the variant of the proposed method are implemented via POSIX Threads, and are tested on two multi-core platforms. The results showed that the variant outperformed the Blelloch's method provided that the size of the array not exceeded the total capacity of the L2 cache of the host machine. In some of the tested cases, the increase in performance was as high as 50%.
AB - This paper presents a novel parallel prefix sum algorithm on n numbers by p processors. A parallel time analysis shows that this proposed algorithm is more efficient and easy-scalable than Blelloch's classic parallel solution for the same problem. Unlike Blelloch's method, the proposed algorithm does not require the number p of processors to be a power of 2, and it needs no extra buffers apart from the array that holds the numbers. For practical considerations, a variant of the proposed algorithm is further developed which incurs less overhead due to thread management and synchronisation. The Blelloch's algorithm and the variant of the proposed method are implemented via POSIX Threads, and are tested on two multi-core platforms. The results showed that the variant outperformed the Blelloch's method provided that the size of the array not exceeded the total capacity of the L2 cache of the host machine. In some of the tested cases, the increase in performance was as high as 50%.
KW - Multi-core computing
KW - Parallel algorithm
KW - Parallel computing
KW - Parallel prefix sum
UR - http://www.scopus.com/inward/record.url?scp=77958076298&partnerID=8YFLogxK
U2 - 10.1109/ICCET.2010.5485315
DO - 10.1109/ICCET.2010.5485315
M3 - Conference Proceeding
AN - SCOPUS:77958076298
SN - 9781424463503
T3 - ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings
SP - V266-V270
BT - ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings
T2 - 2010 2nd International Conference on Computer Engineering and Technology, ICCET 2010
Y2 - 16 April 2010 through 18 April 2010
ER -