TY - GEN
T1 - WIC
T2 - 2025 USENIX Annual Technical Conference, ATC 2025
AU - Zhang, Jiajian
AU - Wu, Fangyu
AU - Jiang, Hai
AU - Wang, Qiufeng
AU - Chen, Genlang
AU - Pang, Chaoyi
N1 - Publisher Copyright:
© 2025 by The USENIX Association. All rights reserved.
PY - 2025
Y1 - 2025
N2 - >GPU communication plays a pivotal role in collaborative computation across multiple devices. Despite advancements in inter-device communication fabrics and architectures, synchronization still remains a significant challenge due to the manual coordination required between producers and consumers at the application level. In this work, we first reveal that traditional synchronization is a primary bottleneck in GPU communication, where consumers frequently poll for producer data availability. Specifically, early-started polling leads to the unnecessary occupation of computational resources. To address this issue, we propose Warp-level Interrupt-based Communication (WIC), a novel synchronization framework for GPU communication that introduces a fine-grained interruption mechanism at the warp level to replace repetitive polling. WIC preemptively stalls warps engaged in frequent polling and releases computational resources for other warps, thereby effectively overlapping producer-consumer synchronization with ongoing computations. Comprehensive experiments demonstrate that WIC significantly outperforms conventional polling methods by 1.13× on average across various applications with diverse communication patterns.
AB - >GPU communication plays a pivotal role in collaborative computation across multiple devices. Despite advancements in inter-device communication fabrics and architectures, synchronization still remains a significant challenge due to the manual coordination required between producers and consumers at the application level. In this work, we first reveal that traditional synchronization is a primary bottleneck in GPU communication, where consumers frequently poll for producer data availability. Specifically, early-started polling leads to the unnecessary occupation of computational resources. To address this issue, we propose Warp-level Interrupt-based Communication (WIC), a novel synchronization framework for GPU communication that introduces a fine-grained interruption mechanism at the warp level to replace repetitive polling. WIC preemptively stalls warps engaged in frequent polling and releases computational resources for other warps, thereby effectively overlapping producer-consumer synchronization with ongoing computations. Comprehensive experiments demonstrate that WIC significantly outperforms conventional polling methods by 1.13× on average across various applications with diverse communication patterns.
UR - https://www.scopus.com/pages/publications/105011599124
M3 - Conference Proceeding
AN - SCOPUS:105011599124
T3 - Proceedings of the 2025 USENIX Annual Technical Conference, ATC 2025
SP - 889
EP - 904
BT - Proceedings of the 2025 USENIX Annual Technical Conference, ATC 2025
PB - USENIX Association
Y2 - 7 July 2025 through 9 July 2025
ER -