How Nvidia Won Every MLPerf Training v5.1 Benchmark

Share this article
Share this article
Prioritise Us on Google
Jensen Huang, CEO and founder of Nvidia, the most valuable company in the world, says 90% of all benchmarkable GPUs belong to Nvidia
With Blackwell Ultra systems and the NVFP4 method, Nvidia secures top performance across all MLPerf v5.1 benchmarks in LLMs, vision and recommender tests

The performance hungry kingdom of AI has a Monarch with unmatched power: Nvidia, the world’s largest public company and the first ever to hit US$5tn valuation

Now dominating the entire MLPerf v5.1 Training Benchmarks, by showcasing the best performance across all seven benchmarking targets, Nvidia set new records with its state of the art Blackwell Ultra GPU architecture. 

With this massive win, Nvidia takes home the title of championing the benchmarking scales while being the only platform to submit entries across all the categories, a testament to the versatility of its Cuda software stack

Nvidia dominates MLPerf Training V5.1 benchmarks | Credit: Nvidia

Nvidia achieved the fastest time to train large language models (LLMs), image generation, recommender systems, computer vision and graph neural networks. 

At the Nvidia GTC held in Washington D.C, Nvidia CEO Jensen Huang recalls that: “For 30 years we have been developing this form of computing we call accelerated computing.”

“We invented the GPU, we invented the programming model called Cuda.

“It’s moment has now arrived.”

What is MLPerf Benchmarking?

MLPerf Training v5.1 is part of a running series of AI benchmarks produced by MLCommons, an open engineering consortium and provides a standardised AI performance matrix. 

It consists of a suite of system tests that measure the performance of models, both its hardware and software, across a range of ML applications. 

The benchmarking test suite is constantly updated, with new tests added and removed to keep up with the ever-increasing progress made in AI. 

Paul Baumstarck, co-chair of the MLPerf Training working group calls the field of AI a moving target

In its latest run, MLPerf Training v5.1, two new benchmarks Llama 3.18b for LLMs and FLUX.1 for text-to-image models were introduced.

The other five benchmarks include Llama 3.1 405B for LLM pre-training, Llama 2 70B LoRA for LLM fine-tuning, RetinaNet for object detection, RGAT for graph node classification and DLRM-dcnv2 for recommender systems. 

“The field of AI is a moving target, constantly evolving with new scenarios and capabilities,” says Paul Baumstarck, co-chair of the MLPerf Training working group. 

“We will continue to evolve the MLPerf Training benchmark suite to ensure that we are measuring what is important to the community, both today and tomorrow.”

How Nvidia Blackwell Ultra delivers record breaking performance leaps 

Nvidia improved its architecture with its new Tensor cores that offer 15 petaflops of AI compute and adapted new training methods to tap into that juicy computing power for its advantage. 

Nvidia delivered 4x performance of Llama 3.1 405B and nearly 5x Llama 2 70B compared to the previous MLPerf benchmarking rounds | Credit: Nvidia

As a result, the Nvidia Blackwell GB300 NVL72 rack-scale system delivered 4x performance of Llama 3.1 405B and nearly 5x Llama 2 70B.

With a record training time of just 10 minutes, Nvidia set a new Llama 3.1 405B training record.

Jensen says that: “If you look at the list of GPUs you could actually benchmark, it is 90% Nvidia.”

The visionary then remarked: “So we are comparing ourselves to ourselves.” 

What is Nvidia’s new NVFP4 training method?

Performing its calculations using NVFP4 precision, Nvidia believes, gave it a superior advantage over its competitors. 

NVFP4 performs calculations on data that is represented by fewer bits. 

Fewer bits generally leads to decrease in accuracy while increasing the speed of computation. 

Youtube Placeholder
Jensen Huang’s Keynote Highlights at NVIDIA GTC Washington, D.C.

To counteract the lower accuracy, Nvidia uses its pioneering architectural innovations: high precision scale encoding and two-level micro-block scaling strategy.

This reduces the memory burden and simplifies computing operations, reducing the dependence on memory bandwidth and thereby improving overall performance. 

Nvidia’s sweep across all seven MLPerf v5.1 categories reinforces its leadership in AI training performance.

With new precision formats and the Blackwell architecture delivering substantial gains, the company has set a high bar for future benchmark rounds.

Company portals

Executives