TY - JOUR
T1 - PrestoZL
T2 - A GPU-accelerated High-throughput Jerk Search Toolkit for Binary Pulsars
AU - Mao, Kuang
AU - Tang, Zhaorong
AU - Pan, Qiuhong
AU - Wang, Pei
AU - Chen, Huaxi
AU - Ransom, Scott M.
AU - Li, Di
AU - Tang, Xuefei
AU - Wang, Qi
AU - Feng, Yi
AU - Chen, Lei
AU - Quan, Donghui
AU - Ren, Zujie
N1 - Publisher Copyright:
© 2025. The Author(s). Published by the American Astronomical Society.
PY - 2025/9/1
Y1 - 2025/9/1
N2 - The Fourier domain jerk search algorithm, an integral component of the PRESTO software suite, has emerged as a key tool for detecting binary pulsars. However, it is a CPU-based jerk search and is a very computationally expensive process, particularly when exploring a broad range of search parameters. In order to address this challenge, we have developed PrestoZL, a GPU-accelerated, high-throughput jerk search toolkit. PrestoZL introduces an innovative GPU parallel design for the jerk search algorithm to mitigate performance degradation caused by memory-intensive operations. We have also developed a pipelined version of PrestoZL, which adds fine-grained orchestration to the CPU-GPU execution pipeline to alleviate the GPU stall problem during the search. The experiment conducted on a 30 minute observation using a machine equipped with an A100-40G GPU and 20 i7-12700K CPUs, shows that PrestoZL achieves an end-to-end speedup of 56.38× over a CPU-based jerk search in PRESTO with OpenMP. PrestoZL achieves search results that are fully identical to the CPU-based jerk search in PRESTO, including the number of detected pulsars, as well as the output search parameters and signal-to-noise ratio values.
AB - The Fourier domain jerk search algorithm, an integral component of the PRESTO software suite, has emerged as a key tool for detecting binary pulsars. However, it is a CPU-based jerk search and is a very computationally expensive process, particularly when exploring a broad range of search parameters. In order to address this challenge, we have developed PrestoZL, a GPU-accelerated, high-throughput jerk search toolkit. PrestoZL introduces an innovative GPU parallel design for the jerk search algorithm to mitigate performance degradation caused by memory-intensive operations. We have also developed a pipelined version of PrestoZL, which adds fine-grained orchestration to the CPU-GPU execution pipeline to alleviate the GPU stall problem during the search. The experiment conducted on a 30 minute observation using a machine equipped with an A100-40G GPU and 20 i7-12700K CPUs, shows that PrestoZL achieves an end-to-end speedup of 56.38× over a CPU-based jerk search in PRESTO with OpenMP. PrestoZL achieves search results that are fully identical to the CPU-based jerk search in PRESTO, including the number of detected pulsars, as well as the output search parameters and signal-to-noise ratio values.
UR - https://www.scopus.com/pages/publications/105015412690
U2 - 10.3847/1538-4365/adf4e5
DO - 10.3847/1538-4365/adf4e5
M3 - Article
AN - SCOPUS:105015412690
SN - 0067-0049
VL - 280
JO - Astrophysical Journal, Supplement Series
JF - Astrophysical Journal, Supplement Series
IS - 1
M1 - 36
ER -