SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales

Jiajian Zhang; Fangyu Wu; Hai Jiang; Guangliang Cheng; Genlang Chen; Qiufeng Wang

doi:10.1145/3673038.3673069

SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales

Jiajian Zhang, Fangyu Wu, Hai Jiang, Guangliang Cheng, Genlang Chen, Qiufeng Wang

Department of Intelligent Science

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Dynamic memory allocation on GPUs, increasingly crucial for applications with dynamic computational patterns, encounters significant challenges due to the complex calculations with intricate branches and substantial memory resources consumed by metadata from massive thread allocations. Despite the current research, there is a lack of a scalable and flexible solution that effectively manages dynamic memory allocation while minimizing memory usage on GPUs. This paper introduces SyncMalloc, a synchronized Host-Device Co-Management system that is specifically designed to adeptly handle dynamic memory allocations of diverse magnitudes. Through the integration of pipelining and producer-consumer mechanisms, SyncMalloc effectively reduces communication overhead and resolves architectural mismatches, further enhancing its capability through synergistic integration with CUDA's unified memory to facilitate oversubscription. Moreover, SyncMalloc advances slab-based memory management to enhance the efficiency of small allocations, reducing conflict probabilities and overhead in high-activity scenarios. Finally, we present a comprehensive performance evaluation, expanding benchmarks and measurement dimensions to reflect the performance of real-world applications more accurately. The experimental results demonstrate the effectiveness of SyncMalloc in supporting dynamic GPU allocations scaled from 4B to 200GB from multiple perspectives. Our source code is available at https://github.com/jjZhang94/SyncMalloc.

Original language	English
Title of host publication	53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
Publisher	Association for Computing Machinery
Pages	179-188
Number of pages	10
ISBN (Electronic)	9798400708428
DOIs	https://doi.org/10.1145/3673038.3673069
Publication status	Published - 12 Aug 2024
Event	53rd International Conference on Parallel Processing, ICPP 2024 - Gotland, Sweden Duration: 12 Aug 2024 → 15 Aug 2024

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	53rd International Conference on Parallel Processing, ICPP 2024
Country/Territory	Sweden
City	Gotland
Period	12/08/24 → 15/08/24

Keywords

Dynamic Allocation
GPU
Memory Management

Access to Document

10.1145/3673038.3673069

Cite this

Zhang, J., Wu, F., Jiang, H., Cheng, G., Chen, G., & Wang, Q. (2024). SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales. In 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings (pp. 179-188). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3673038.3673069

@inproceedings{5bdc312c433a45e0a8c5569fde0d4d5b,

title = "SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales",

abstract = "Dynamic memory allocation on GPUs, increasingly crucial for applications with dynamic computational patterns, encounters significant challenges due to the complex calculations with intricate branches and substantial memory resources consumed by metadata from massive thread allocations. Despite the current research, there is a lack of a scalable and flexible solution that effectively manages dynamic memory allocation while minimizing memory usage on GPUs. This paper introduces SyncMalloc, a synchronized Host-Device Co-Management system that is specifically designed to adeptly handle dynamic memory allocations of diverse magnitudes. Through the integration of pipelining and producer-consumer mechanisms, SyncMalloc effectively reduces communication overhead and resolves architectural mismatches, further enhancing its capability through synergistic integration with CUDA's unified memory to facilitate oversubscription. Moreover, SyncMalloc advances slab-based memory management to enhance the efficiency of small allocations, reducing conflict probabilities and overhead in high-activity scenarios. Finally, we present a comprehensive performance evaluation, expanding benchmarks and measurement dimensions to reflect the performance of real-world applications more accurately. The experimental results demonstrate the effectiveness of SyncMalloc in supporting dynamic GPU allocations scaled from 4B to 200GB from multiple perspectives. Our source code is available at https://github.com/jjZhang94/SyncMalloc.",

keywords = "Dynamic Allocation, GPU, Memory Management",

author = "Jiajian Zhang and Fangyu Wu and Hai Jiang and Guangliang Cheng and Genlang Chen and Qiufeng Wang",

note = "Publisher Copyright: {\textcopyright} 2024 Owner/Author.; 53rd International Conference on Parallel Processing, ICPP 2024 ; Conference date: 12-08-2024 Through 15-08-2024",

year = "2024",

month = aug,

day = "12",

doi = "10.1145/3673038.3673069",

language = "English",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "179--188",

booktitle = "53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings",

}

Zhang, J, Wu, F, Jiang, H, Cheng, G, Chen, G & Wang, Q 2024, SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales. in 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 179-188, 53rd International Conference on Parallel Processing, ICPP 2024, Gotland, Sweden, 12/08/24. https://doi.org/10.1145/3673038.3673069

SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales. / Zhang, Jiajian; Wu, Fangyu; Jiang, Hai et al.
53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings. Association for Computing Machinery, 2024. p. 179-188 (ACM International Conference Proceeding Series).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - SyncMalloc

T2 - 53rd International Conference on Parallel Processing, ICPP 2024

AU - Zhang, Jiajian

AU - Wu, Fangyu

AU - Jiang, Hai

AU - Cheng, Guangliang

AU - Chen, Genlang

AU - Wang, Qiufeng

PY - 2024/8/12

Y1 - 2024/8/12

N2 - Dynamic memory allocation on GPUs, increasingly crucial for applications with dynamic computational patterns, encounters significant challenges due to the complex calculations with intricate branches and substantial memory resources consumed by metadata from massive thread allocations. Despite the current research, there is a lack of a scalable and flexible solution that effectively manages dynamic memory allocation while minimizing memory usage on GPUs. This paper introduces SyncMalloc, a synchronized Host-Device Co-Management system that is specifically designed to adeptly handle dynamic memory allocations of diverse magnitudes. Through the integration of pipelining and producer-consumer mechanisms, SyncMalloc effectively reduces communication overhead and resolves architectural mismatches, further enhancing its capability through synergistic integration with CUDA's unified memory to facilitate oversubscription. Moreover, SyncMalloc advances slab-based memory management to enhance the efficiency of small allocations, reducing conflict probabilities and overhead in high-activity scenarios. Finally, we present a comprehensive performance evaluation, expanding benchmarks and measurement dimensions to reflect the performance of real-world applications more accurately. The experimental results demonstrate the effectiveness of SyncMalloc in supporting dynamic GPU allocations scaled from 4B to 200GB from multiple perspectives. Our source code is available at https://github.com/jjZhang94/SyncMalloc.

AB - Dynamic memory allocation on GPUs, increasingly crucial for applications with dynamic computational patterns, encounters significant challenges due to the complex calculations with intricate branches and substantial memory resources consumed by metadata from massive thread allocations. Despite the current research, there is a lack of a scalable and flexible solution that effectively manages dynamic memory allocation while minimizing memory usage on GPUs. This paper introduces SyncMalloc, a synchronized Host-Device Co-Management system that is specifically designed to adeptly handle dynamic memory allocations of diverse magnitudes. Through the integration of pipelining and producer-consumer mechanisms, SyncMalloc effectively reduces communication overhead and resolves architectural mismatches, further enhancing its capability through synergistic integration with CUDA's unified memory to facilitate oversubscription. Moreover, SyncMalloc advances slab-based memory management to enhance the efficiency of small allocations, reducing conflict probabilities and overhead in high-activity scenarios. Finally, we present a comprehensive performance evaluation, expanding benchmarks and measurement dimensions to reflect the performance of real-world applications more accurately. The experimental results demonstrate the effectiveness of SyncMalloc in supporting dynamic GPU allocations scaled from 4B to 200GB from multiple perspectives. Our source code is available at https://github.com/jjZhang94/SyncMalloc.

KW - Dynamic Allocation

KW - GPU

KW - Memory Management

UR - http://www.scopus.com/inward/record.url?scp=85202438630&partnerID=8YFLogxK

U2 - 10.1145/3673038.3673069

DO - 10.1145/3673038.3673069

M3 - Conference Proceeding

AN - SCOPUS:85202438630

T3 - ACM International Conference Proceeding Series

SP - 179

EP - 188

BT - 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings

PB - Association for Computing Machinery

Y2 - 12 August 2024 through 15 August 2024

ER -

Zhang J, Wu F, Jiang H, Cheng G, Chen G, Wang Q. SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales. In 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings. Association for Computing Machinery. 2024. p. 179-188. (ACM International Conference Proceeding Series). doi: 10.1145/3673038.3673069

SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this