Broadcom: How Tomahawk Ultra Will Power AI Workloads

Broadcom is going after Nvidia’s networking business with a new type of Ethernet switch that can keep up with the demanding requirements of AI training systems. The Tomahawk Ultra, now shipping to customers, processes data in just 250 nanoseconds: fast enough to compete with the specialised interconnects that Nvidia controls.
When Nvidia bought networking company Mellanox for US$7bn in 2020, it gained control of InfiniBand – the high-speed connection technology that links together the thousands of AI chips needed for training large language models. As companies continue to pour money into AI infrastructure, the networking gear connecting those chips has become almost as valuable as the processors themselves.
Traditional Ethernet has long been considered too slow and unreliable for these AI workloads. Training an AI model requires chips to constantly share data with each other, and any delays or lost information can force the entire process to restart, wasting hours of expensive computing time.
Broadcom spent years redesigning how Ethernet works to solve these problems. The Tomahawk Ultra can handle 51.2 terabits of data per second while maintaining the ultra-fast response times that AI training demands.
“This highlights Broadcom's commitment to invest in advancing Ethernet for high-performance networking and AI scale-up,” comments Ram Velaga, SVP and General Manager of Broadcom’s Core Switching Group.
How Broadcom’s Tomahawk Ultra aims to transform HPC and AI
The switch incorporates two mechanisms to prevent packet loss: Link Layer Retry (LLR) and Credit-Based Flow Control (CBFC). LLR detects transmission errors using Forward Error Correction and automatically retransmits packets before they reach higher protocol layers. CBFC manages buffer levels to prevent overflow conditions that typically cause packet drops.
This combination creates what Broadcom calls a “lossless fabric” – a change that matters for AI workloads where dropped packets can force entire training runs to restart, wasting expensive compute time.
The switch also handles collective operations – like AllReduce and AllGather – directly in hardware rather than offloading these tasks to processing units. This feature works with any endpoint hardware, allowing deployment across different vendor ecosystems without requiring specific accelerator support.
Broadcom has also developed the Scale-Up Ethernet (SUE) specification to define how the switch integrates with AI systems. When deployed with SUE-compliant hardware, the Tomahawk Ultra enables sub-400ns communication latency between processing units, including the switch transit time.
The company has made the SUE specification publicly available and created SUE-Lite, a reduced-complexity version for power-constrained applications. SUE-Lite maintains the low-latency characteristics while reducing silicon area and power consumption on AI accelerators and CPUs.
Equipment vendors confirm integration across multiple platforms
Several equipment manufacturers have confirmed plans to integrate the switch into their systems. Accton, Delta Electronics, HPE and others are preparing products based on the new chip, with some citing the pin compatibility as a key factor in accelerating their development timelines.
AMD also plans to combine the switch with its Instinct GPUs and EPYC processors. “Low latency is essential to unleashing the full potential of AI – from reducing training times to powering real-time inference,” comments Forrest Norrod, EVP and GM of AMD’s Data Center Solutions Group. “By combining Broadcom’s new Tomahawk Ultra switch with AMD Instinct GPUs and EPYC processors, we’re enabling high-performance, standards-based Ethernet solutions for AI infrastructure.”
Intel has validated configurations that connect up to 64 of its Gaudi 3 AI accelerators per rack using the Tomahawk Ultra, achieving total memory bandwidth of 76.8TB/s. “This rack-level bandwidth unlocks new possibilities for training and real-time inference of the most complex LLMs, redefining industry SLAs,” says Saurabh Kulkarni, VP of AI Technical Product Management at Intel.
The switch maintains complete pin compatibility with the existing Tomahawk 5, allowing equipment manufacturers to upgrade existing designs without board-level changes.
“AI and HPC workloads are converging into tightly coupled accelerator clusters that demand supercomputer-class latency – critical for inference, reliability and in-network intelligence from the fabric itself,” says Kunjan Sobhani, Lead Semiconductor Analyst at Bloomberg Intelligence. “Demonstrating that open-standards Ethernet can now deliver sub-microsecond switching, lossless transport and on-chip collectives marks a pivotal step toward meeting those demands of an AI scale-up stack – projected to be double digit billions in a few years.”


