Iris Coleman
Oct 14, 2025 16:42
NVIDIA introduces Coherent Driver-based Reminiscence Administration (CDMM) to enhance GPU reminiscence management on hardware-coherent platforms, addressing points confronted by builders and cluster directors.
NVIDIA has launched a brand new reminiscence administration mode, Coherent Driver-based Reminiscence Administration (CDMM), designed to boost the management and efficiency of GPU reminiscence on hardware-coherent platforms corresponding to GH200, GB200, and GB300. This improvement goals to deal with the challenges posed by non-uniform reminiscence entry (NUMA), which might result in inconsistent system efficiency when purposes will not be absolutely NUMA-aware, based on NVIDIA.
NUMA vs. CDMM
NUMA mode, the present default for NVIDIA drivers on hardware-coherent platforms, exposes each CPU and GPU reminiscence to the working system (OS). This setup permits reminiscence allocation by commonplace Linux and CUDA APIs, facilitating dynamic reminiscence migration between CPU and GPU. Nonetheless, this will additionally end in GPU reminiscence being handled as a generic pool, probably affecting utility efficiency negatively.
In distinction, CDMM mode prevents GPU reminiscence from being uncovered to the OS as a software program NUMA node. As a substitute, the NVIDIA driver immediately manages GPU reminiscence, offering extra exact management and probably boosting utility efficiency. This method is akin to the PCIe-attached GPU mannequin, the place GPU reminiscence stays distinct from system reminiscence.
Implications for Kubernetes
The introduction of CDMM is especially vital for Kubernetes, a widely-used platform for managing massive GPU clusters. In NUMA mode, Kubernetes could encounter surprising behaviors, corresponding to reminiscence over-reporting and incorrect utility of pod reminiscence limits, which might result in efficiency points and utility failures. CDMM mode helps mitigate these points by guaranteeing higher isolation and management over GPU reminiscence.
Influence on Builders and System Directors
For CUDA builders, CDMM mode impacts how system-allocated reminiscence is dealt with. Whereas GPU can nonetheless entry system-allocated reminiscence throughout the NVLink chip-to-chip connection, reminiscence pages is not going to migrate as they could in NUMA mode. This variation requires builders to adapt their reminiscence administration methods to totally leverage the capabilities of CDMM.
System directors will discover that instruments like numactl or mbind are ineffective for GPU reminiscence administration in CDMM mode, as GPU reminiscence will not be introduced to the OS. Nonetheless, these instruments can nonetheless be utilized for managing system reminiscence.
Pointers for Selecting Between CDMM and NUMA
When deciding between CDMM and NUMA modes, think about the particular reminiscence administration wants of your purposes. NUMA mode is appropriate for purposes that depend on OS administration of mixed CPU and GPU reminiscence. In distinction, CDMM mode is right for purposes requiring direct GPU reminiscence management, bypassing the OS for enhanced efficiency and management.
In the end, CDMM mode gives builders and directors the power to harness the complete potential of NVIDIA’s hardware-coherent reminiscence architectures, optimizing efficiency for GPU-accelerated workloads. For these utilizing platforms like GH200, GB200, or GB300, enabling CDMM mode might present vital advantages, particularly in Kubernetes environments.
Picture supply: Shutterstock


