AMD Instinct MI350 Series Debuts in MLPerf 5.1 Training Benchmarks
The latest MLPerf 5.1 Training results mark a significant milestone for the AI hardware landscape: the first-ever MLPerf training submission featuring AMD Instinct MI350 Series GPUs. These new benchmarks highlight substantial generational performance improvements and underscore the growing participation of a broad ecosystem in tackling some of today’s most demanding AI training workloads.
Breakthrough Performance: Up to 2.8X Faster AI Training
The AMD Instinct MI350 Series, including the MI355X and MI350X GPUs, has set new standards for AI training performance. According to MLPerf 5.1 Training results, the MI350 Series delivers up to 2.8 times faster time-to-train compared to the previous-generation MI300X, and 2.1 times faster than the MI325X platform.
On the Llama 2-70B LoRA (FP8) benchmark, the MI355X GPU reduced training time from nearly 28 minutes on the MI300X to just over 10 minutes. Even compared to the MI325X, the MI355X nearly halves the training duration. These gains are driven by architectural enhancements, industry-leading HBM3E memory bandwidth, and AMD ROCm 7.1 software optimizations that improve both kernel performance and communication efficiency. The result is faster model fine-tuning and improved energy efficiency for large-scale generative AI workloads.
With each new generation, AMD Instinct GPUs continue to push the boundaries of AI training performance, accelerating the journey from model design to deployment.
Competitive Edge Across Industry Benchmarks
The MI355X platform demonstrates highly competitive training performance across leading generative AI workloads. In the MLPerf 5.1 Training round, AMD’s results were compared to the average of all NVIDIA partner submissions using FP8 precision on B200 and B300 GPUs.
On the Llama 2-70B LoRA (FP8) benchmark, the MI355X completed training in 10.18 minutes, closely matching the averaged NVIDIA B200 and B300 results of 9.85 and 9.59 minutes, respectively. For Llama 3.1-8B (FP8) pre-training, the MI355X finished in 99.7 minutes, compared to 93.69 and 95.10 minutes for the NVIDIA-based systems.
Notably, NVIDIA did not submit FP8 results in the current MLPerf Training v5.1 round, focusing instead on FP4, which is not yet production-ready for training workloads due to tradeoffs in numerical accuracy. AMD continues to prioritize FP8 training, the most widely adopted datatype for large-scale, high-accuracy model training, while also advancing FP4 algorithmic development for future use.
The most recent FP8 training result from NVIDIA, published in MLPerf Training v5.0, saw 8 GB200 GPUs achieve an 11.15-minute training time on Llama 2-70B LoRA. In the current round, the MI355X completed the same workload in 10.18 minutes—a nearly 10% improvement. These results reinforce the MI355X platform’s competitive, efficient, and scalable training capabilities, and highlight the growing strength of the AMD Instinct and ROCm ecosystem for next-generation generative AI.
Record Ecosystem Participation and Consistency
The MLPerf 5.1 Training round also set a new record for ecosystem participation on the AMD Instinct platform, with nine partners—ASUS, Cisco, Dell, Giga Computing, Krai, MangoBoost, MiTAC, QCT, and Supermicro—submitting training results on AMD hardware. This marks the broadest industry engagement to date for AMD in MLPerf Training.
Impressively, every partner’s first submission on the new MI355X platform landed within just 1% of AMD’s own results on the same benchmarks. This level of alignment highlights the maturity and consistency of the ROCm software stack and the readiness of AMD Instinct hardware for immediate deployment across diverse configurations.
Partners demonstrated high-performance results on challenging workloads such as Llama 2-70B LoRA fine-tuning and Llama 3.1-8B pre-training, confirming that AMD Instinct MI355X GPU systems deliver reproducible, high-performance outcomes across a range of real-world AI training scenarios.
AMD ROCm 7.1: Powering High-Performance, Scalable AI Training
At the core of all MLPerf 5.1 training submissions on AMD Instinct GPUs is AMD ROCm 7.1, the software platform enabling high performance, scalability, and efficiency. This release introduces comprehensive advancements, from kernel and compiler optimizations to improved communication efficiency and seamless framework integration.
ROCm 7.1 accelerates model convergence using high-efficiency FP8 precision, delivering both performance and numerical stability for demanding generative AI models. Key optimizations include tuned GEMM operations, fused attention mechanisms, and updated compiler stacks such as XLA and TorchInductor, resulting in higher throughput and consistent performance across diverse training workloads.
Enhanced memory and communication efficiency further improve bandwidth utilization and enable better scaling from single GPU to multi-node systems. With day-0 support for leading frameworks and models—including Llama 3.1-8B, Mistral, and SD-XL—ROCm 7.1 equips developers to train and fine-tune the latest AI workloads immediately.
The consistency and performance seen across partner submissions with the MI355X platform are a testament to the robustness of AMD ROCm software, demonstrating that software innovation is as critical as hardware in delivering efficient, scalable, and production-ready AI training.
Advancing AI Training Leadership
The MLPerf 5.1 Training results represent a pivotal moment for the AMD Instinct MI350 Series, showcasing breakthrough generational performance, strong competitive positioning, and record ecosystem participation—all powered by the open and rapidly evolving ROCm 7.1 software platform.
With up to 2.8X faster training performance over the previous generation, near-parity with NVIDIA’s latest FP8-based submissions, and partner results within 1% of AMD’s own, the MI355X platform demonstrates both leadership and consistency in real-world AI training workloads.
This progress is the result of a deliberate, steady innovation cadence. The AMD Instinct roadmap continues to advance annually—from the MI300X in 2023 to the MI325X in 2024, and now the MI350 Series in 2025—delivering new levels of compute density, memory bandwidth, and software optimization with each generation. Looking ahead, the MI450 Series and next-generation CDNA architecture are set to extend this momentum into 2026 and beyond.
Together, AMD Instinct GPUs and ROCm software form a unified platform for AI training and inference, built to scale with the evolving demands of generative AI. As MLPerf benchmarking evolves, AMD remains committed to open benchmarking, collaboration, and continuous innovation—driving the performance and efficiency that define modern AI infrastructure.