Overview of ROCm Ecosystem (v6.4.1-20250616)¶

Work-in-progress

This document is a work-in-progress. It may still contain inaccuracies or mistakes.

This overview is being created in the context of adding support for ROCm to EESSI, the European Environment for Scientific Software Installations (https://eessi.io).

Last update: 16 Jun 2025

Jump to Overview | Jump to ABC | Jump to Changelog

Introduction¶

The AMD ROCm™ (Radeon Open Compute) platform is an open-source software stack designed for GPU computing. ROCm 6.4.x provides a comprehensive set of tools, libraries, and software development kits that enable developers to harness the power of AMD's hardware accelerators.

ROCm serves as AMD's unified platform for high-performance computing (HPC), artificial intelligence (AI), and machine learning workloads, offering a viable alternative to NVIDIA's CUDA ecosystem. The platform is designed with portability, performance, and open standards in mind.

The ROCm software stack consists of six major parts:

AMD GPU Microarchitectures: the microarchitectures used by AMD GPU hardware
Core Components: software essential to using AMD GPUs (drivers, runtimes, etc)
Programming Models: how to create programs that run on AMD GPUs
Compiler Ecosystem: compilers with support for the programming models
Developer Tools: debugging, profiling, and tracing tools
Libraries and Frameworks: for common operations and programming structures

AMD GPU Microarchitectures¶

AMD's GPU architectures have evolved significantly over the years, with distinct product lines targeting different market segments.

CDNA (Compute DNA)¶

CDNA is AMD's data center and HPC-focused architecture for GPU compute workloads.

CDNA 1 (2020)
- Used in AMD Instinct MI100 accelerator
- Matrix Core Technology for AI/ML workloads
CDNA 2 (2021)
- Powers AMD Instinct MI200 series
- MCM (multi-chip module) design with chiplets
- Infinity Fabric connections for multi-GPU scaling
CDNA 3 (2023)
- Powers AMD Instinct MI300 series
- Integrates CPU and GPU in the same package (for example MI300A) (APD - Accelerated Processing Device)
- Enhanced AI and HPC capabilities

RDNA (Radeon DNA)¶

RDNA is AMD's consumer-focused graphics architecture, designed for gaming and content creation.

RDNA 1 (2019)
- First introduced with the Radeon RX 5000 series
RDNA 2 (2020)
- Powers Radeon RX 6000 series
- Used in PlayStation 5 and Xbox Series X/S consoles
RDNA 3 (2022)
- Powers Radeon RX 7000 series
- Chiplet-based design (first for consumer GPUs)
RDNA 4 (2024)
- Powers latest Radeon RX 8000 and 9000 series

Earlier Architectures¶

GCN (Graphics Core Next) - 2011-2019
- Five generations (GCN 1-5)
- Transitioned to RDNA for consumer products
- Powered Radeon HD 7000 through RX Vega and some RX 500 series
Vega (2017)
- Based on GCN 5
- Used in Radeon RX Vega and Radeon VII

GFX Codes¶

In LLVM each AMDGPU processor has an architecture (GFX) code that indicates which specific microarchitecture is used. These codes are critical for hardware compatibility and optimization with ROCm. Generally, AMD uses the "gfxAB" format, where A is a major version indicator and B a two-digit minor version indicator. The format "gfxA" is also used to refer to a family of architectures with the same major version indicator.

An overview of gfx codes:

GFX6 (GCN): gfx600, gfx601, gfx602
GFX7 (GCN): gfx700, ..., gfx705
GFX8 (GCN): gfx801, gfx802, gfx803, gfx805, gfx810
GFX9 (Vega): gfx900, gfx902, gfx904, gfx906
GFX9 (CDNA1): gfx908
GFX9 (CDNA2): gfx90a
GFX9 (CDNA3): gfx942
GFX10.1 (RDNA1): gfx1010, ..., gfx1013
GFX10.3 (RDNA2): gfx1030, ..., gfx1036
GFX11 (RDNA3): gfx1100, ..., gfx1103
GFX11 (RDNA3.5): gfx1150, ..., gfx1153
GFX12 (RDNA4): gfx1200, gfx1201

Source

Core Components¶

AMD Docs Source | DeepWiki Source | DeepWiki Source | Github

AMDGPU Driver with KFD (Github)
- The kernel-mode driver for AMD GPUs
Platform Runtime (Github)
- Runtime that manages GPU resources, scheduling, and memory management
ROCm-LLVM (Github)
- AMD-maintained fork of the LLVM git repository
HIP (Github)
- C++ Heterogeneous-Compute Interface for Portability
- Runtime API and kernel language
AMD SMI (System Management Interface) (Github)
- AMD SMI - equivalent to nvidia-smi
- Successor to ROCm SMI
ROCm SMI (System Management Interface) (Github) (deprecated)
- ROCm SMI LIB - equivalent to nvidia-smi
ROCm CMake (Github)
- CMake modules for common build and dev tasks within ROCm
- Build dependency for many ROCm libraries
ROCm Core (Github)
- ROCm package with version and install path info
- Pretty much all ROCm packages depend on this
ROCm Info (Github)
- ROCm application for reporting system info
ROCm Examples (Github)
- A collection of examples for the ROCm software stack

Core Components Dependencies¶

graph LR;
    driver[AMDGPU Driver]
    runtime[ROCm Platform Runtime]
    llvm[ROCm LLVM Compiler]
    hip[HIP]
    amdsmi[AMD SMI]
    rocmsmi[ROCm SMI]
    rocmcmake[ROCm CMake]
    rocminfo[ROCm Info]
    rocmexamples[ROCm Examples]

    runtime --> driver
    runtime --> llvm

    hip --> runtime
    hip --> llvm
    hip --> rocmcmake
    hip --> rocminfo

    rocmcmake --> llvm
    rocminfo --> llvm

    rocmexamples --> hip
    rocmexamples --> amdsmi

Programming Models¶

HIP (Heterogeneous-Computing Interface for Portability)¶

HIP is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD and NVIDIA GPUs. It's a key component of ROCm's strategy for facilitating code migration from CUDA.

HIP is a core component of ROCm, see the core components section for more details.

HIP Github
CLR Github
Features:
- CUDA-like programming model with familiar syntax
- Source-level compatibility with CUDA
- Tools to automate conversion of CUDA code (HIPIFY) (Github)
- Runtime API and kernel language for GPU computing

OpenMP Support¶

ROCm supports OpenMP offloading, which allows developers to use directive-based programming to offload computations to GPUs.

OpenMP support is implemented by the ROCm LLVM compiler.

Features:
- Familiar pragma-based approach
- Incremental parallelization of existing CPU code
- Support for target offload constructs

OpenCL Support¶

While not the primary focus of ROCm, OpenCL support is maintained for compatibility with existing code bases and as an open standard option.

Github

Programming Models Dependencies¶

graph LR;
    subgraph Core Components
        runtime[ROCm Platform Runtime]
        llvm[ROCm LLVM Compiler]
        rocmcmake[ROCm CMake]
        rocminfo[ROCm Info]
    end

    hip[HIP]
    openmp[OpenMP Support]
    opencl[OpenCL Support]

    hip --> runtime
    hip --> llvm
    hip --> rocmcmake
    hip --> rocminfo

    openmp --> runtime
    openmp --> llvm
    opencl --> runtime
    opencl --> llvm

Compiler Ecosystem¶

ROCm provides a comprehensive set of compilers to support various programming languages and models. These compilers are essential for translating high-level code into optimized machine code for AMD GPUs.

C/C++ Compilers¶

ROCm-LLVM (AMDGPU LLVM / amdclang++) (Github):
- The foundation of ROCm's compiler toolchain
- Based on LLVM/Clang infrastructure with AMD GPU-specific additions
- Supports HIP, OpenMP offloading, and other programming models
AOMP (AMD OpenMP Compiler) (Github) (preview):
- Specialized for OpenMP target offloading to AMD GPUs
- Based on the LLVM project with specific optimizations for OpenMP
- Supports OpenMP 5.0+ features relevant to GPU offloading
- Currently a development-preview, not yet a full product
AOCC (AMD Optimizing C/C++ Compiler):
- Primarily focused on AMD CPU optimization
- Can be used in conjunction with ROCm for heterogeneous computing
- Based on LLVM/Clang with AMD-specific optimizations
- Closed source
HIPCC:
- Compiler wrapper for HIP applications
- Simplifies compilation process by handling complex flag combinations
- Part of the HIP package and the ROCm-LLVM project

Fortran Compilers¶

AOCC (AMD Optimizing Fortran Compiler):
- Based on Flang and LLVM
- Supports GPU offloading via OpenMP directives
- Optimized for AMD architectures
Flang for ROCm (Github) (deprecated):
- Part of the LLVM project's Fortran implementation
- The new Flang implementation (as described in LLVM's blog post) brings improved compatibility and performance

Developer Tools¶

ROCm offers several tools to aid in development, debugging, and performance optimization:

ROC gdb: Debugger for HIP and OpenCL applications (Github)
ROC Tracer: API tracing library (Github)
ROC Profiler: Performance profiling tool (Github)
ROC Debugger API: Provides support necessary for debugging tools (Github)
Profiler SDK: New profiler SDK, combines ROC Tracer and ROC Profiler (Github)
Compute Profiler: Performance analysis tool for AMD GPUs (Github)
Systems Profiler: Performance analysis tool for applications on the CPU and GPU (Github)

Developer Tools Dependencies¶

graph LR;
    subgraph Core Components
        rocmstack[ROCm Stack - driver, runtime, llvm, hip]
        rocmsmi[ROCm SMI]
        rocmcmake[ROCm CMake]
    end

    rocgdb[ROC gdb]
    roctracer[ROC Tracer]
    rocprofiler[ROC Profiler]
    rocdbgapi[ROC Debugger API]
    profilersdk[Profiler SDK]
    computeprofiler[Compute Profiler]
    systemsprofiler[Systems Profiler]

    rocgdb --> rocmstack
    rocgdb --> rocdbgapi

    roctracer --> rocmstack

    rocdbgapi --> rocmstack
    rocdbgapi --> rocmcmake

    rocprofiler --> rocmstack
    rocprofiler --> rocmsmi
    rocprofiler --> rocdbgapi

    profilersdk --> rocmstack

    computeprofiler --> rocmstack
    computeprofiler --> rocmsmi
    computeprofiler --> rocdbgapi

    systemsprofiler --> rocmstack
    systemsprofiler --> rocmsmi
    systemsprofiler --> profilersdk

Libraries and Frameworks¶

ROCm provides a rich set of libraries to accelerate various computational workloads.

ROCm also provides a set of marshalling libraries which implement a portable interface for operations across different GPU vendors (AMD and NVIDIA). These libraries automatically translate calls to the appropriate backend - either "roc" variants or "cu" variants - depending on the target hardware. The "roc" variants like rocFFT are AMD's native implementations optimized specifically for AMD GPUs, while the "hip" variants like hipFFT are the portable wrappers that can target either AMD or NVIDIA hardware through a unified API.

Core Math Libraries¶

hipBLASLt: General matrix-matrix operations, extends beyond BLAS (Github)
hipSPARSELt: Marshalling library and ROCm version of cuSPARSELt (Github)
rocBLAS: Basic Linear Algebra Subprograms implementation (Github)
rocFFT: Fast Fourier Transform implementation (Github)
rocRAND: Random number generator library (Github)
rocSOLVER: Linear algebra solver library (Github)
rocSPARSE: Sparse matrix routines (Github)

ML/DL Frameworks¶

MIOpen: Deep learning primitives library (Github)
ROCm PyTorch: PyTorch support for AMD GPUs (Github)
ROCm TensorFlow: TensorFlow support for AMD GPUs (Github)

Communication Libraries¶

RCCL: Communication library for multi-GPU/multi-node training (Github)

Marshalling Libraries¶

hipBLAS (Github)
hipFFT (Github)
hipRAND (Github)
hipSOLVER (Github)
hipSPARSE (Github)

Libraries and Frameworks Dependencies¶

graph LR;
    subgraph Core Components
        rocmstack[ROCm Stack - driver, runtime, llvm, hip]
    end
    subgraph Developer Tools
        roctracer[ROC Tracer]
    end
    subgraph Libraries and Frameworks
        rocfft[rocFFT]
        rocrand[rocRAND]
        rocblas[rocBLAS]
        rocsparse[rocSPARSE]
        rocsolver[rocSOLVER]
        hipblaslt[hipBLASLt]
        hipsparselt[hipSPARSELt]
        miopen[MIOpen]
        rccl[RCCL]
        hipfft[hipFFT]
        hiprand[hipRAND]
        hipblas[hipBLAS]
        hipsparse[hipSPARSE]
        hipsolver[hipSOLVER]
    end

    rocfft --> rocmstack
    rocrand --> rocmstack
    rocblas --> rocmstack
    rocsparse --> rocmstack
    rocsolver --> rocmstack
    rocsolver --> rocblas
    hipblaslt --> rocmstack
    hipblaslt --> roctracer
    hipsparselt --> rocmstack
    hipsparselt --> roctracer
    hipsparselt --> hipsparse

    miopen --> rocmstack
    miopen --> rocblas
    miopen --> hipblaslt
    miopen --> hipblas

    rccl --> rocmstack

    hipfft --> rocfft
    hiprand --> rocrand
    hipblas --> rocblas
    hipblas --> rocsparse
    hipblas --> rocsolver
    hipsparse --> rocsparse
    hipsolver --> rocsolver

Compatibility Policies¶

Source

ROCm Version and GPU Driver Compatibility¶

ROCm follows a versioning scheme that ensures compatibility between the software stack and GPU drivers.

Major Version Compatibility:
- Major ROCm versions (e.g., 6.x) typically maintain driver compatibility within the same major version.
- Major version upgrades may require driver updates.
Minor Version Compatibility:
- Minor versions (e.g., 6.4.x) are generally compatible with drivers designed for the same major version.
- Backward compatibility is maintained where possible, but newer hardware features may require newer drivers.

ROCm Version and glibc Compatibility¶

Source

Versions of glibc supported by ROCm 6.4:

2.28
2.31
2.34
2.35
2.36
2.38
2.39

AMD GPUs in Azure¶

Azure offers several VM series featuring AMD GPUs. The following is an overview of available SKUs.

Source

NVv4 series (Azure)
- cpu: AMD EPYC 7V12 (Rome) [x86-64]
- gpu: AMD Instinct MI25 GPU (16GB)
NGads_V620 series (Azure)
- cpu: AMD EPYC 7763 (Milan) [x86-64]
- gpu: AMD Radeon PRO V620 GPU (32GB)
NVads_V710_v5 series (Azure)
- cpu: AMD EPYC 9V64 F (Genoa) [x86-64]
- gpu: AMD Radeon™ Pro V710
ND-MI300X-V5 series (Azure)
- cpu: Intel Xeon (Sapphire Rapids) [x86-64]
- gpu: AMD Instinct MI300X GPU (192GB)

ABC of ROCm¶

A¶

AMDGPU Driver ¶

The kernel-mode driver for AMD GPUs that provides the foundation for ROCm's functionality, including the Kernel Fusion Driver (KFD) that enables compute capabilities.

AMD Instinct ¶

AMD Instinct is AMD's dedicated compute accelerator lineup for data centers and AI/HPC applications, optimized for the ROCm software platform.

AMD Docs

AMD SMI ¶

AMD SMI (System Management Interface) is a command-line tool within the ROCm ecosystem that allows users to query and control various aspects of AMD GPUs. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.

AMD Docs

AOCC ¶

AMD Optimizing C/C++/Fortran Compiler, AMD's optimizing compiler suite primarily focused on AMD CPU optimization but can work with ROCm for heterogeneous computing.

AMD Docs

AOMP ¶

AMD OpenMP Compiler, a specialized compiler (development-preview) for OpenMP target offloading to AMD GPUs, supporting OpenMP 5.0+ features for GPU computing.

C¶

CDNA ¶

CDNA (Compute DNA) is AMD's GPU architecture optimized specifically for data center and high-performance computing workloads within the ROCm ecosystem.

AMD Docs and AMD Docs

G¶

GCN ¶

GCN (Graphics Core Next) is AMD's older GPU architecture that served as the foundation for their compute-focused platforms in early ROCm releases.

AMD Docs and AMD Docs

GFX ¶

GFX codes in AMD ROCm are architecture identifiers that specify GPU hardware generations, determining compatibility and optimization targets for HPC and machine learning workloads on AMD GPUs.

AMD Docs and AMD Docs

H¶

HIP ¶

HIP (Heterogeneous-Compute Interface for Portability) is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD GPUs and NVIDIA GPUs, serving as a key component of the ROCm platform for high-performance computing and machine learning workloads.

AMD Docs

HIPIFY ¶

HIPIFY is a tool within AMD's ROCm platform that converts CUDA code into portable HIP (Heterogeneous-computing Interface for Portability) code to enable GPU applications to run on AMD hardware.

AMD Docs

O¶

OpenCL ¶

OpenCL is a framework that allows developers to write programs that execute across heterogeneous platforms (including AMD GPUs) by using the OpenCL runtime and compiler infrastructure provided within the ROCm ecosystem.

OpenMP ¶

OpenMP is a parallel programming model supported through the ROCm toolchain that allows developers to write multi-threaded CPU and GPU code using familiar OpenMP directives, targeting AMD GPUs via the Clang/LLVM compiler infrastructure.

P¶

Platform Runtime ¶

The Platform Runtime refers to the ROCr (ROCm Runtime) layer that provides low-level APIs for managing GPU resources, memory, and queues, forming the foundation upon which higher-level programming models like HIP operate.

AMD Docs

R¶

Radeon RX ¶

AMD Radeon RX is a consumer-focused GPU series primarily designed for gaming and content creation.

RDNA ¶

RDNA (Radeon DNA) is AMD's consumer-focused graphics architecture optimized for gaming and media applications within the ROCm ecosystem.

AMD Docs and AMD Docs

ROCm ¶

Radeon Open Compute is an open-source software stack developed by AMD for GPU computing and machine learning applications.

AMD Docs

ROCm-LLVM ¶

AMD ROCm's LLVM implementation is a modified version of the LLVM compiler infrastructure that enables GPU code generation, optimization, and execution for AMD GPUs within the ROCm (Radeon Open Compute) platform, providing essential support for high-performance computing and machine learning workloads.

AMD Docs

ROCm SMI ¶

ROCm SMI (System Management Interface) is a command-line utility for monitoring and managing AMD GPUs within the ROCm ecosystem, providing functionality to query hardware information, control power states, monitor temperature, configure memory, and manage device performance. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.

AMD Docs

V¶

Vega ¶

Vega refers to AMD's GPU architecture that was one of the first to fully support the ROCm ecosystem for high-performance computing and machine learning workloads.

AMD Docs and AMD Docs

Changelog¶

v6.4.1-20250610¶

started changelog
moved github and azure links
removed ROCTracer from core components
added HIP, ROCm-core, and ROCm-cmake to core components
improved core components dependencies graph
improved programming models dependencies graph
removed compilers dependencies graph
added a big dependencies graph

v6.4.1-20250611¶

fixed dependency graphs (except for dev tools)
added ROCm dependencies of PyTorch

v6.4.1-20250616¶

added ROC Debugger API to dev tools
fixed dev tools dependencies graph
sorted libraries
removed the big dependencies graph in favour of dedicated ones