Overview of ROCm Ecosystem (v6.4.1-20250616)¶
Work-in-progress
This document is a work-in-progress. It may still contain inaccuracies or mistakes.
This overview is being created in the context of adding support for ROCm to EESSI, the European Environment for Scientific Software Installations (https://eessi.io).
Last update: 16 Jun 2025
Jump to Overview | Jump to ABC | Jump to Changelog
Table of Contents¶
- Introduction
- AMD GPU Microarchitectures
- Core Components
- Programming Models
- Compiler Ecosystem
- Developer Tools
- Libraries and Frameworks
- Compatibility Policies
- AMD GPUs in Azure
Introduction¶
The AMD ROCm™ (Radeon Open Compute) platform is an open-source software stack designed for GPU computing. ROCm 6.4.x provides a comprehensive set of tools, libraries, and software development kits that enable developers to harness the power of AMD's hardware accelerators.
ROCm serves as AMD's unified platform for high-performance computing (HPC), artificial intelligence (AI), and machine learning workloads, offering a viable alternative to NVIDIA's CUDA ecosystem. The platform is designed with portability, performance, and open standards in mind.
The ROCm software stack consists of six major parts:
- AMD GPU Microarchitectures: the microarchitectures used by AMD GPU hardware
- Core Components: software essential to using AMD GPUs (drivers, runtimes, etc)
- Programming Models: how to create programs that run on AMD GPUs
- Compiler Ecosystem: compilers with support for the programming models
- Developer Tools: debugging, profiling, and tracing tools
- Libraries and Frameworks: for common operations and programming structures
AMD GPU Microarchitectures¶
AMD's GPU architectures have evolved significantly over the years, with distinct product lines targeting different market segments.
CDNA (Compute DNA)¶
CDNA is AMD's data center and HPC-focused architecture for GPU compute workloads.
- CDNA 1 (2020)
- Used in AMD Instinct MI100 accelerator
- Matrix Core Technology for AI/ML workloads
- CDNA 2 (2021)
- Powers AMD Instinct MI200 series
- MCM (multi-chip module) design with chiplets
- Infinity Fabric connections for multi-GPU scaling
- CDNA 3 (2023)
- Powers AMD Instinct MI300 series
- Integrates CPU and GPU in the same package (for example MI300A) (APD - Accelerated Processing Device)
- Enhanced AI and HPC capabilities
RDNA (Radeon DNA)¶
RDNA is AMD's consumer-focused graphics architecture, designed for gaming and content creation.
- RDNA 1 (2019)
- First introduced with the Radeon RX 5000 series
- RDNA 2 (2020)
- Powers Radeon RX 6000 series
- Used in PlayStation 5 and Xbox Series X/S consoles
- RDNA 3 (2022)
- Powers Radeon RX 7000 series
- Chiplet-based design (first for consumer GPUs)
- RDNA 4 (2024)
- Powers latest Radeon RX 8000 and 9000 series
Earlier Architectures¶
- GCN (Graphics Core Next) - 2011-2019
- Five generations (GCN 1-5)
- Transitioned to RDNA for consumer products
- Powered Radeon HD 7000 through RX Vega and some RX 500 series
- Vega (2017)
- Based on GCN 5
- Used in Radeon RX Vega and Radeon VII
GFX Codes¶
In LLVM each AMDGPU processor has an architecture (GFX) code that indicates which specific microarchitecture is used. These codes are critical for hardware compatibility and optimization with ROCm. Generally, AMD uses the "gfxAB" format, where A is a major version indicator and B a two-digit minor version indicator. The format "gfxA" is also used to refer to a family of architectures with the same major version indicator.
An overview of gfx codes:
- GFX6 (GCN): gfx600, gfx601, gfx602
- GFX7 (GCN): gfx700, ..., gfx705
- GFX8 (GCN): gfx801, gfx802, gfx803, gfx805, gfx810
- GFX9 (Vega): gfx900, gfx902, gfx904, gfx906
- GFX9 (CDNA1): gfx908
- GFX9 (CDNA2): gfx90a
- GFX9 (CDNA3): gfx942
- GFX10.1 (RDNA1): gfx1010, ..., gfx1013
- GFX10.3 (RDNA2): gfx1030, ..., gfx1036
- GFX11 (RDNA3): gfx1100, ..., gfx1103
- GFX11 (RDNA3.5): gfx1150, ..., gfx1153
- GFX12 (RDNA4): gfx1200, gfx1201
Core Components¶
AMD Docs Source | DeepWiki Source | DeepWiki Source | Github
- AMDGPU Driver with KFD (Github)
- The kernel-mode driver for AMD GPUs
- Platform Runtime (Github)
- Runtime that manages GPU resources, scheduling, and memory management
- ROCm-LLVM (Github)
- AMD-maintained fork of the LLVM git repository
- HIP (Github)
- C++ Heterogeneous-Compute Interface for Portability
- Runtime API and kernel language
- AMD SMI (System Management Interface) (Github)
- AMD SMI - equivalent to nvidia-smi
- Successor to ROCm SMI
- ROCm SMI (System Management Interface) (Github) (deprecated)
- ROCm SMI LIB - equivalent to nvidia-smi
- ROCm CMake (Github)
- CMake modules for common build and dev tasks within ROCm
- Build dependency for many ROCm libraries
- ROCm Core (Github)
- ROCm package with version and install path info
- Pretty much all ROCm packages depend on this
- ROCm Info (Github)
- ROCm application for reporting system info
- ROCm Examples (Github)
- A collection of examples for the ROCm software stack
Core Components Dependencies¶
graph LR;
driver[AMDGPU Driver]
runtime[ROCm Platform Runtime]
llvm[ROCm LLVM Compiler]
hip[HIP]
amdsmi[AMD SMI]
rocmsmi[ROCm SMI]
rocmcmake[ROCm CMake]
rocminfo[ROCm Info]
rocmexamples[ROCm Examples]
runtime --> driver
runtime --> llvm
hip --> runtime
hip --> llvm
hip --> rocmcmake
hip --> rocminfo
rocmcmake --> llvm
rocminfo --> llvm
rocmexamples --> hip
rocmexamples --> amdsmi
Programming Models¶
HIP (Heterogeneous-Computing Interface for Portability)¶
HIP is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD and NVIDIA GPUs. It's a key component of ROCm's strategy for facilitating code migration from CUDA.
HIP is a core component of ROCm, see the core components section for more details.
- HIP Github
- CLR Github
- Features:
OpenMP Support¶
ROCm supports OpenMP offloading, which allows developers to use directive-based programming to offload computations to GPUs.
OpenMP support is implemented by the ROCm LLVM compiler.
- Features:
- Familiar pragma-based approach
- Incremental parallelization of existing CPU code
- Support for target offload constructs
OpenCL Support¶
While not the primary focus of ROCm, OpenCL support is maintained for compatibility with existing code bases and as an open standard option.
Programming Models Dependencies¶
graph LR;
subgraph Core Components
runtime[ROCm Platform Runtime]
llvm[ROCm LLVM Compiler]
rocmcmake[ROCm CMake]
rocminfo[ROCm Info]
end
hip[HIP]
openmp[OpenMP Support]
opencl[OpenCL Support]
hip --> runtime
hip --> llvm
hip --> rocmcmake
hip --> rocminfo
openmp --> runtime
openmp --> llvm
opencl --> runtime
opencl --> llvm
Compiler Ecosystem¶
ROCm provides a comprehensive set of compilers to support various programming languages and models. These compilers are essential for translating high-level code into optimized machine code for AMD GPUs.
C/C++ Compilers¶
- ROCm-LLVM (AMDGPU LLVM / amdclang++) (Github):
- The foundation of ROCm's compiler toolchain
- Based on LLVM/Clang infrastructure with AMD GPU-specific additions
- Supports HIP, OpenMP offloading, and other programming models
- AOMP (AMD OpenMP Compiler) (Github) (preview):
- Specialized for OpenMP target offloading to AMD GPUs
- Based on the LLVM project with specific optimizations for OpenMP
- Supports OpenMP 5.0+ features relevant to GPU offloading
- Currently a development-preview, not yet a full product
- AOCC (AMD Optimizing C/C++ Compiler):
- Primarily focused on AMD CPU optimization
- Can be used in conjunction with ROCm for heterogeneous computing
- Based on LLVM/Clang with AMD-specific optimizations
- Closed source
- HIPCC:
Fortran Compilers¶
- AOCC (AMD Optimizing Fortran Compiler):
- Based on Flang and LLVM
- Supports GPU offloading via OpenMP directives
- Optimized for AMD architectures
- Flang for ROCm (Github) (deprecated):
- Part of the LLVM project's Fortran implementation
- The new Flang implementation (as described in LLVM's blog post) brings improved compatibility and performance
Developer Tools¶
ROCm offers several tools to aid in development, debugging, and performance optimization:
- ROC gdb: Debugger for HIP and OpenCL applications (Github)
- ROC Tracer: API tracing library (Github)
- ROC Profiler: Performance profiling tool (Github)
- ROC Debugger API: Provides support necessary for debugging tools (Github)
- Profiler SDK: New profiler SDK, combines ROC Tracer and ROC Profiler (Github)
- Compute Profiler: Performance analysis tool for AMD GPUs (Github)
- Systems Profiler: Performance analysis tool for applications on the CPU and GPU (Github)
Developer Tools Dependencies¶
graph LR;
subgraph Core Components
rocmstack[ROCm Stack - driver, runtime, llvm, hip]
rocmsmi[ROCm SMI]
rocmcmake[ROCm CMake]
end
rocgdb[ROC gdb]
roctracer[ROC Tracer]
rocprofiler[ROC Profiler]
rocdbgapi[ROC Debugger API]
profilersdk[Profiler SDK]
computeprofiler[Compute Profiler]
systemsprofiler[Systems Profiler]
rocgdb --> rocmstack
rocgdb --> rocdbgapi
roctracer --> rocmstack
rocdbgapi --> rocmstack
rocdbgapi --> rocmcmake
rocprofiler --> rocmstack
rocprofiler --> rocmsmi
rocprofiler --> rocdbgapi
profilersdk --> rocmstack
computeprofiler --> rocmstack
computeprofiler --> rocmsmi
computeprofiler --> rocdbgapi
systemsprofiler --> rocmstack
systemsprofiler --> rocmsmi
systemsprofiler --> profilersdk
Libraries and Frameworks¶
ROCm provides a rich set of libraries to accelerate various computational workloads.
ROCm also provides a set of marshalling libraries which implement a portable interface for operations across different GPU vendors (AMD and NVIDIA). These libraries automatically translate calls to the appropriate backend - either "roc" variants or "cu" variants - depending on the target hardware. The "roc" variants like rocFFT are AMD's native implementations optimized specifically for AMD GPUs, while the "hip" variants like hipFFT are the portable wrappers that can target either AMD or NVIDIA hardware through a unified API.
Core Math Libraries¶
- hipBLASLt: General matrix-matrix operations, extends beyond BLAS (Github)
- hipSPARSELt: Marshalling library and ROCm version of cuSPARSELt (Github)
- rocBLAS: Basic Linear Algebra Subprograms implementation (Github)
- rocFFT: Fast Fourier Transform implementation (Github)
- rocRAND: Random number generator library (Github)
- rocSOLVER: Linear algebra solver library (Github)
- rocSPARSE: Sparse matrix routines (Github)
ML/DL Frameworks¶
- MIOpen: Deep learning primitives library (Github)
- ROCm PyTorch: PyTorch support for AMD GPUs (Github)
- ROCm TensorFlow: TensorFlow support for AMD GPUs (Github)
Communication Libraries¶
- RCCL: Communication library for multi-GPU/multi-node training (Github)
Marshalling Libraries¶
Libraries and Frameworks Dependencies¶
graph LR;
subgraph Core Components
rocmstack[ROCm Stack - driver, runtime, llvm, hip]
end
subgraph Developer Tools
roctracer[ROC Tracer]
end
subgraph Libraries and Frameworks
rocfft[rocFFT]
rocrand[rocRAND]
rocblas[rocBLAS]
rocsparse[rocSPARSE]
rocsolver[rocSOLVER]
hipblaslt[hipBLASLt]
hipsparselt[hipSPARSELt]
miopen[MIOpen]
rccl[RCCL]
hipfft[hipFFT]
hiprand[hipRAND]
hipblas[hipBLAS]
hipsparse[hipSPARSE]
hipsolver[hipSOLVER]
end
rocfft --> rocmstack
rocrand --> rocmstack
rocblas --> rocmstack
rocsparse --> rocmstack
rocsolver --> rocmstack
rocsolver --> rocblas
hipblaslt --> rocmstack
hipblaslt --> roctracer
hipsparselt --> rocmstack
hipsparselt --> roctracer
hipsparselt --> hipsparse
miopen --> rocmstack
miopen --> rocblas
miopen --> hipblaslt
miopen --> hipblas
rccl --> rocmstack
hipfft --> rocfft
hiprand --> rocrand
hipblas --> rocblas
hipblas --> rocsparse
hipblas --> rocsolver
hipsparse --> rocsparse
hipsolver --> rocsolver
Compatibility Policies¶
ROCm Version and GPU Driver Compatibility¶
ROCm follows a versioning scheme that ensures compatibility between the software stack and GPU drivers.
- Major Version Compatibility:
- Major ROCm versions (e.g., 6.x) typically maintain driver compatibility within the same major version.
- Major version upgrades may require driver updates.
- Minor Version Compatibility:
- Minor versions (e.g., 6.4.x) are generally compatible with drivers designed for the same major version.
- Backward compatibility is maintained where possible, but newer hardware features may require newer drivers.
ROCm Version and glibc Compatibility¶
Versions of glibc supported by ROCm 6.4:
- 2.28
- 2.31
- 2.34
- 2.35
- 2.36
- 2.38
- 2.39
AMD GPUs in Azure¶
Azure offers several VM series featuring AMD GPUs. The following is an overview of available SKUs.
- NVv4 series (Azure)
- cpu: AMD EPYC 7V12 (Rome) [x86-64]
- gpu: AMD Instinct MI25 GPU (16GB)
- NGads_V620 series (Azure)
- cpu: AMD EPYC 7763 (Milan) [x86-64]
- gpu: AMD Radeon PRO V620 GPU (32GB)
- NVads_V710_v5 series (Azure)
- cpu: AMD EPYC 9V64 F (Genoa) [x86-64]
- gpu: AMD Radeon™ Pro V710
- ND-MI300X-V5 series (Azure)
- cpu: Intel Xeon (Sapphire Rapids) [x86-64]
- gpu: AMD Instinct MI300X GPU (192GB)
ABC of ROCm¶
AMDGPU Driver | AMD Instinct | AMD SMI | AOCC | AOMP | CDNA | GCN | GFX | HIP | HIPIFY | OpenCL | OpenMP | Platform Runtime | Radeon RX | RDNA | ROCm | ROCm-LLVM | ROCm SMI | Vega
A¶
AMDGPU Driver¶
The kernel-mode driver for AMD GPUs that provides the foundation for ROCm's functionality, including the Kernel Fusion Driver (KFD) that enables compute capabilities.
AMD Instinct¶
AMD Instinct is AMD's dedicated compute accelerator lineup for data centers and AI/HPC applications, optimized for the ROCm software platform.
AMD SMI¶
AMD SMI (System Management Interface) is a command-line tool within the ROCm ecosystem that allows users to query and control various aspects of AMD GPUs. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.
AOCC¶
AMD Optimizing C/C++/Fortran Compiler, AMD's optimizing compiler suite primarily focused on AMD CPU optimization but can work with ROCm for heterogeneous computing.
AOMP¶
AMD OpenMP Compiler, a specialized compiler (development-preview) for OpenMP target offloading to AMD GPUs, supporting OpenMP 5.0+ features for GPU computing.
C¶
CDNA¶
CDNA (Compute DNA) is AMD's GPU architecture optimized specifically for data center and high-performance computing workloads within the ROCm ecosystem.
G¶
GCN¶
GCN (Graphics Core Next) is AMD's older GPU architecture that served as the foundation for their compute-focused platforms in early ROCm releases.
GFX¶
GFX codes in AMD ROCm are architecture identifiers that specify GPU hardware generations, determining compatibility and optimization targets for HPC and machine learning workloads on AMD GPUs.
H¶
HIP¶
HIP (Heterogeneous-Compute Interface for Portability) is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD GPUs and NVIDIA GPUs, serving as a key component of the ROCm platform for high-performance computing and machine learning workloads.
HIPIFY¶
HIPIFY is a tool within AMD's ROCm platform that converts CUDA code into portable HIP (Heterogeneous-computing Interface for Portability) code to enable GPU applications to run on AMD hardware.
O¶
OpenCL¶
OpenCL is a framework that allows developers to write programs that execute across heterogeneous platforms (including AMD GPUs) by using the OpenCL runtime and compiler infrastructure provided within the ROCm ecosystem.
OpenMP¶
OpenMP is a parallel programming model supported through the ROCm toolchain that allows developers to write multi-threaded CPU and GPU code using familiar OpenMP directives, targeting AMD GPUs via the Clang/LLVM compiler infrastructure.
P¶
Platform Runtime¶
The Platform Runtime refers to the ROCr (ROCm Runtime) layer that provides low-level APIs for managing GPU resources, memory, and queues, forming the foundation upon which higher-level programming models like HIP operate.
R¶
Radeon RX¶
AMD Radeon RX is a consumer-focused GPU series primarily designed for gaming and content creation.
RDNA¶
RDNA (Radeon DNA) is AMD's consumer-focused graphics architecture optimized for gaming and media applications within the ROCm ecosystem.
ROCm¶
Radeon Open Compute is an open-source software stack developed by AMD for GPU computing and machine learning applications.
ROCm-LLVM¶
AMD ROCm's LLVM implementation is a modified version of the LLVM compiler infrastructure that enables GPU code generation, optimization, and execution for AMD GPUs within the ROCm (Radeon Open Compute) platform, providing essential support for high-performance computing and machine learning workloads.
ROCm SMI¶
ROCm SMI (System Management Interface) is a command-line utility for monitoring and managing AMD GPUs within the ROCm ecosystem, providing functionality to query hardware information, control power states, monitor temperature, configure memory, and manage device performance. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.
V¶
Vega¶
Vega refers to AMD's GPU architecture that was one of the first to fully support the ROCm ecosystem for high-performance computing and machine learning workloads.
Changelog¶
v6.4.1-20250610¶
- started changelog
- moved github and azure links
- removed ROCTracer from core components
- added HIP, ROCm-core, and ROCm-cmake to core components
- improved core components dependencies graph
- improved programming models dependencies graph
- removed compilers dependencies graph
- added a big dependencies graph
v6.4.1-20250611¶
- fixed dependency graphs (except for dev tools)
- added ROCm dependencies of PyTorch
v6.4.1-20250616¶
- added ROC Debugger API to dev tools
- fixed dev tools dependencies graph
- sorted libraries
- removed the big dependencies graph in favour of dedicated ones