TY - JOUR
T1 - AlignMalloc
T2 - Warp-Aware Memory Rearrangement Aligned with UVM Prefetching for Large-Scale GPU Dynamic Allocations
AU - Zhang, Jiajian
AU - Wu, Fangyu
AU - Jiang, Hai
AU - Wang, Qiufeng
AU - Chen, Genlang
AU - Cheng, Guangliang
AU - Lim, Eng Gee
AU - Li, Keqin
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - As parallel computing tasks rapidly expand in both complexity and scale, the need for efficient GPU dynamic memory allocation becomes increasingly important. While progress has been made in developing dynamic allocators for substantial applications, their real-world applicability is still limited due to inefficient memory access behaviors. This paper introduces AlignMalloc, a novel memory management system that aligns with the Unified Virtual Memory (UVM) prefetching strategy, significantly enhancing both memory allocation and access performance in large-scale dynamic allocation scenarios. We analyze the fundamental inefficiencies in UVM access and first reveal the mismatch between memory access and UVM prefetching methods. To resolve this issue, AlignMalloc implements a warp-aware memory rearrangement strategy that exploits the regularity of warps to align with the UVM's static prefetching setup. Additionally, AlignMalloc introduces an OR tree-based structure within a host-co-managed framework to further optimize dynamic allocation. Comprehensive experiments demonstrate that AlignMalloc substantially outperforms current state-of-the-art systems, achieving up to 2.7 × improvement in dynamic allocation and 2.3 × in memory access. Additionally, eight real-world applications with diverse memory access patterns exhibit consistent performance enhancements, with average speedups 1.5 ×.
AB - As parallel computing tasks rapidly expand in both complexity and scale, the need for efficient GPU dynamic memory allocation becomes increasingly important. While progress has been made in developing dynamic allocators for substantial applications, their real-world applicability is still limited due to inefficient memory access behaviors. This paper introduces AlignMalloc, a novel memory management system that aligns with the Unified Virtual Memory (UVM) prefetching strategy, significantly enhancing both memory allocation and access performance in large-scale dynamic allocation scenarios. We analyze the fundamental inefficiencies in UVM access and first reveal the mismatch between memory access and UVM prefetching methods. To resolve this issue, AlignMalloc implements a warp-aware memory rearrangement strategy that exploits the regularity of warps to align with the UVM's static prefetching setup. Additionally, AlignMalloc introduces an OR tree-based structure within a host-co-managed framework to further optimize dynamic allocation. Comprehensive experiments demonstrate that AlignMalloc substantially outperforms current state-of-the-art systems, achieving up to 2.7 × improvement in dynamic allocation and 2.3 × in memory access. Additionally, eight real-world applications with diverse memory access patterns exhibit consistent performance enhancements, with average speedups 1.5 ×.
KW - Dynamic Allocation
KW - Memory Arrangement
KW - Unified Virtual Memory
UR - http://www.scopus.com/inward/record.url?scp=105004994640&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2025.3568688
DO - 10.1109/TPDS.2025.3568688
M3 - Article
AN - SCOPUS:105004994640
SN - 1045-9219
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
ER -