



DCGM supports Linux operating systems on x86_64, Arm and POWER (ppc64le) platforms. It can be used standalone by infrastructure teams and easily integrates into cluster management tools, resource scheduling and monitoring products from NVIDIA partners.ĭCGM simplifies GPU administration in the data center, improves resource reliability and uptime, automates administrative tasks, and helps drive overall infrastructure efficiency. It includes active health monitoring, comprehensive diagnostics, system alerts and governance policies including power and clock management. NVIDIA Data Center GPU Manager (DCGM) is a suite of tools for managing and monitoring NVIDIA datacenter GPUs in cluster environments. Manage and Monitor GPUs in Cluster Environments
