GB300 NVL72: Nvidia’s Creation for Data Centre Energy Spikes

Share this article
Share this article
Prioritise Us on Google
The Nvidia GB200 NVL72's features provide steady power for AI-driven demands (Credit: Nvidia)
Nvidia launches energy management features in its GB300 NVL72 and GB200 NVL72 AI data centre platforms for large-scale AI training workloads

As large-scale AI training workloads increasingly heat up data centres, Nvidia is rolling out new energy management features in its platform to address the energy demands.

It’s no secret that with thousands of GPUs operating in synchronised bursts, data centres running these tasks are creating power fluctuations that strain grid infrastructure.

This is where the GB300 NVL72 comes in, as it integrates hardware and software to smooth power spikes, aiming to reduce peak demand on the grid by up to 30%.

CoreWeave’s GB300 NVL72 deployment (Credit: Switch)

These new features also appear in the GB200 NVL72 platform and allow data centre operators to reduce the over-provisioning of power infrastructure, potentially lowering operating costs and increasing rack density within existing budgets.

The Nvidia GB300 NVL72 is a big step forward in performance for AI reasoning and agentic workloads, delivering up to a:

  • 10x boost in user responsiveness
  • 5x improvement in throughput per watt compared to the previous generation Nvidia Hopper architecture
  • 50x increase in output for reasoning model inference

In July, CoreWeave became the first cloud provider to deploy the platform.

Peter Salanki, Co-Founder and CTO at CoreWeave

“CoreWeave is constantly working to push the boundaries of AI development further, deploying the bleeding-edge cloud capabilities required to train the next generation of AI models,” said Peter Salanki, Co-Founder and Chief Technology Officer at CoreWeave.

“We’re proud to be the first to stand up this transformative platform and help innovators prepare for the next exciting wave of AI.”

How AI training disrupts grid consistency

Traditional data centres operate workloads that vary across systems, helping to balance power demand.

The new features of the GB300 NVL72 platform also appear in the GB200 (Credit: Nvidia)

Alternatively, AI training involves many GPUs performing identical calculations on different data simultaneously, causing abrupt swings between high and low power states across entire racks.

This means that the grid must respond to rapid load changes, which can take up to 90 minutes using conventional resources, creating electrical resonance, transformer stress and voltage instability.

Addressing this challenge, Nvidia engineers use heatmaps and timestamp charts to visualise how GPUs ramp up power at the job’s start, cycle through rapid load changes and drop sharply at the end.

Youtube Placeholder

This led to the development of a coordinated feature set to smooth AI workloads across three stages: ramp-up, steady state and ramp-down.

The role of hardware and software for power smoothing

At workload start, Nvidia’s new power cap feature controls GPU draw by gradually aligning power limits with grid ramp tolerances, preventing destabilising surges.

At the end of a run, the GB300 platform employs a GPU burn mechanism, temporarily maintaining high power draw to taper off slowly.

This burn mode uses the GPU to dissipate power in a controlled manner.

When a new workload begins, it disengages immediately; if not, the system reduces power according to preconfigured limits.

Nvidia's GB200 NVL72 platform (Credit: Nvidia)

During the steady state, Nvidia’s updated power shelves with energy storage, including electrolytic capacitors, charge during low-demand intervals and discharge during peaks, which smooths the power curve.

Unlike the older GB200 PSU, the GB300 shelf reduces grid-facing peak power by 30% while maintaining the same GPU output pattern.

Nvidia collaborated with LITEON Technology to optimise the GB300 power shelf’s physical design.

Energy storage elements now occupy half the volume, with 65 joules per GPU. A charge management controller orchestrates power storage and release in real-time, ensuring grid stability.

The importance of reducing provisioning for AI-scale facilities

Data centres have traditionally provisioned power for worst-case loads, meaning peak GPU demand must be supported even if brief – and the GB300’s energy smoothing allows infrastructure to be dimensioned closer to average use.

Youtube Placeholder

This allows operators to either increase rack numbers within the same facility power budget or reduce the overall power required for a deployment.

Power smoothing occurs entirely within the rack, optimising the grid load profile and avoiding energy return to the utility. Its strategies in both GB200 and GB300 NVL72 systems are also managed at shelf and rack levels.

Additionally, Nvidia’s SMI tool or the Redfish protocol allows configuration fine-tuning, including GPU idle time and target ramp rates.

These advancements in the GB300 NVL72 address growing AI infrastructure energy demands, providing a rack-level, fast transient power smoothing solution that enables data centres to support increasing model sizes without overwhelming power systems.

Company portals