AlignMalloc: Warp-Aware Memory Rearrangement Aligned with UVM Prefetching for Large-Scale GPU Dynamic Allocations

Jiajian Zhang, Fangyu Wu*, Hai Jiang, Qiufeng Wang, Genlang Chen, Guangliang Cheng, Eng Gee Lim, Keqin Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

As parallel computing tasks rapidly expand in both complexity and scale, the need for efficient GPU dynamic memory allocation becomes increasingly important. While progress has been made in developing dynamic allocators for substantial applications, their real-world applicability is still limited due to inefficient memory access behaviors. This paper introduces AlignMalloc, a novel memory management system that aligns with the Unified Virtual Memory (UVM) prefetching strategy, significantly enhancing both memory allocation and access performance in large-scale dynamic allocation scenarios. We analyze the fundamental inefficiencies in UVM access and first reveal the mismatch between memory access and UVM prefetching methods. To resolve this issue, AlignMalloc implements a warp-aware memory rearrangement strategy that exploits the regularity of warps to align with the UVM's static prefetching setup. Additionally, AlignMalloc introduces an OR tree-based structure within a host-co-managed framework to further optimize dynamic allocation. Comprehensive experiments demonstrate that AlignMalloc substantially outperforms current state-of-the-art systems, achieving up to 2.7 × improvement in dynamic allocation and 2.3 × in memory access. Additionally, eight real-world applications with diverse memory access patterns exhibit consistent performance enhancements, with average speedups 1.5 ×.

Original languageEnglish
JournalIEEE Transactions on Parallel and Distributed Systems
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Dynamic Allocation
  • Memory Arrangement
  • Unified Virtual Memory

Fingerprint

Dive into the research topics of 'AlignMalloc: Warp-Aware Memory Rearrangement Aligned with UVM Prefetching for Large-Scale GPU Dynamic Allocations'. Together they form a unique fingerprint.

Cite this