Cerebras reveals Andromeda, a 13.5mn-core AI supercomputer

November 15, 2022

undefined mins

Cerebras says Andromeda delivers more than 1 Exaflop of AI compute and 120 Petaflops of dense compute at 16-bit half precision. (Pic: Cerebras)

Cerebras says Andromeda, fed by 18,000 processors, is the only AI supercomputer capable of near-perfect linear scaling on large language model workloads

American AI startup Cerebras Systems has unveiled Andromeda, a 13.5 million-core AI supercomputer, which is now available and being used for commercial and academic work.

Built with a cluster of 16 of the company’s CS-2 chips, Cerebras says Andromeda delivers more than 1 Exaflop of AI compute and 120 Petaflops of dense compute at 16-bit half precision. Cerebras says it is the only AI supercomputer to ever demonstrate near-perfect linear scaling on large language model workloads relying on simple data parallelism alone.

With more than 13.5 million AI-optimised compute cores and fed by more than 18,000 AMD EPYC™ processors, Andromeda, which is deployed in Santa Clara, California, in 16 racks at the Colovore data centre, features more cores than 1,953 Nvidia A100 GPUs and 1.6 times as many cores as the largest supercomputer in the world, Frontier, which has 8.7 million cores. Unlike any known GPU-based cluster, Cerebras says Andromeda delivers near-perfect scaling via simple data parallelism across GPT-class large language models, including GPT-3, GPT-J and GPT-NeoX.

AI system reduces training time of large language models

Near-perfect scaling means that as additional CS-2s are used, training time is reduced in near-perfect proportion. This includes large language models with very large sequence lengths, a task that is impossible to achieve on GPUs. In fact, GPU impossible work was demonstrated by one of Andromeda’s first users, who achieved near-perfect scaling on GPT-J at 2.5 billion and 25 billion parameters with long sequence lengths — MSL of 10,240.

Access to Andromeda is available now, and customers and academic researchers are already running real workloads and deriving value from the leading AI supercomputer’s extraordinary capabilities.

“Jasper uses large language models to write copy for marketing, ads, books, and more,” said Dave Rogenmoser, CEO of JasperAI. “We have over 85,000 customers who use our models to generate moving content and ideas. Given our large and growing customer base, we’re exploring testing and scaling models fit to each customer and their use cases.

“Creating complex new AI systems and bringing it to customers at increasing levels of granularity demands a lot from our infrastructure. We are thrilled to partner with Cerebras and leverage Andromeda’s performance and near-perfect scaling without traditional distributed computing and parallel programming pains to design and optimise our next set of models.”

“AMD is investing in technology that will pave the way for pervasive AI, unlocking new efficiency and agility abilities for businesses,” said Kumaran Siva, corporate vice president, Software & Systems Business Development, AMD. “The combination of the Cerebras Andromeda AI supercomputer and a data pre-processing pipeline powered by AMD EPYC-powered servers, together will put more capacity in the hands of researchers and support faster and deeper AI capabilities.”

In a blog on the company’s website, a Cerebras spokesperson said: “Andromeda can harvest structured and unstructured sparsity as well as static and dynamic sparsity. These are things other hardware accelerators, including GPUs, simply can’t do. The result is that Cerebras can train models in excess of 90% sparse to state-of-the-art accuracy.

Andromeda can be used simultaneously by multiple users. Users can easily specify how many of Andromeda’s CS-2s they want to use within seconds. This means Andromeda can be used as a 16 CS-2 supercomputer cluster for a single user working on a single job, or 16 individual CS-2 systems for sixteen distinct users with 16 distinct jobs, or any combination in between.”

#ai #supercomputer #language model #scaling