../_images/amdblack5.jpg

ROCm™ Data Center Tool

The ROCm™ Data Center Tool™ simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments. The main features are:

  • GPU telemetry

  • GPU statistics for jobs

  • Integration with third-party tools

  • Open source

The tool can be used in stand-alone mode if all components are installed. However, the existing management tools can use the same set of features available in a library format.

Refer to the Starting RDC section in the ROCm Data Center Tool User Guide for details on different modes of operation.

Objective

This user guide is intended to:

  • Provide an overview of the ROCm Data Center Tool features

  • Describe how system administrators and Data Center (or HPC) users can administer and configure AMD GPUs

  • Describe the components

  • Provide an overview of the open source developer handbook

Target Audience

The audience for the AMD ROCm Data Center™ tool consists of:

  • Administrators: The tool will provide cluster administrator with the capability of monitoring, validating, and configuring policies.

  • HPC Users: Provides GPU centric feedback for their workload submissions

  • OEM: Add GPU information to their existing cluster management software

  • Open Source Contributors: RDC is open source and will accept contributions from the community