What Meta’s new AI Models Mean for the Global AI Race

Every major player in the AI race is vying for supremacy in what analysts estimate could become a US$1.3tn market by 2032.
Now, innovating to keep a leading position, Meta has released the fourth iteration of its open source large language model (LLM) Llama, introducing two variants called Scout and Maverick.
Llama is a family of LLMs first introduced in 2023, with models of various sizes ranging from 1 billion to 405 billion parameters (parameters being the values within a neural network that determine how input data is transformed into output predictions).
The models are designed for tasks including content creation, visual understanding, enterprise applications and multimodal processing.
Llama is a multimodal AI system, which means it can process and integrate various types of data including text, video, images and audio, whilst also converting content between these formats – being an advancement over text-only systems that dominated earlier AI models.
The company confirmed that both Llama 4 Scout and Llama 4 Maverick would follow the open source software approach, making the technology available for broader development communities to examine and build upon.
“These are our most advanced models yet and the best in their class for multimodality,” it says in its announcement.
Llama 4 Scout, Maverick and Behemoth
Meta also revealed it was previewing a more substantial model called Llama 4 Behemoth as well as Scout and Maverick.
Llama 4 Scout
Llama 4 Scout contains 17 billion active parameters with 16 experts (experts are specialised neural network components that focus on specific tasks or data type) and 109 billion total parameters.
It can run on a single Nvidia H100 GPU with Int4 quantization, a technique that reduces computational requirements by using lower numerical precision.
According to Meta, Scout “dramatically increases the supported context length from 128K in Llama 3 to an industry leading 10 million tokens,” – context length referring to the amount of text a model can process in a single operation, measured in tokens, which are word fragments used by AI systems.
Llama 4 Maverick
Meanwhile, Llama 4 Maverick contains 17 billion active parameters with 128 experts and 400 billion total parameters – and both models utilise a mixture-of-experts (MoE) architecture, where only a fraction of the total parameters are activated for processing each token.
“In MoE models, a single token activates only a fraction of the total parameters,” the company says.
“MoE architectures are more compute efficient for training and inference and, given a fixed training FLOPs budget, delivers higher quality compared to a dense model”.
- Open-source access
- Native multimodality
- Training robustness
This approach improves efficiency by lowering model serving costs and latency.
Llama 4 Behemoth
Meta has also revealed a preview of Llama 4 Behemoth, described as “one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models”. Behemoth contains 288 billion active parameters, 16 experts and nearly two trillion total parameters.
How Meta’s native multimodality features in Llama 4 series
The Llama 4 models are designed with native multimodality, incorporating early fusion to integrate text and vision tokens into a unified model backbone, (early fusion enables joint pre-training with text, image and video data).
Meta also improved the vision encoder in Llama 4, based on their MetaCLIP technology but trained separately with a frozen Llama model to better adapt the encoder to the LLM.
The company reports that the overall data mixture for training consisted of more than 30 trillion tokens, which is more than double the Llama 3 pre-training mixture and includes diverse text, image and video datasets.
Meta has made both models available for download on llama.com and Hugging Face, with availability across cloud and data platforms to follow.
“We continue to believe that openness drives innovation and is good for developers, good for Meta and good for the world,” Meta says in its announcement.
Meta’s competition from OpenAI and Google in multimodal AI
Other leading technology companies have introduced their own multimodal systems, creating pressure for Meta to demonstrate its continued relevance in the AI race.
Microsoft-backed OpenAI, Google with its Gemini models and Anthropic with Claude have all released multimodal systems that process multiple types of content simultaneously.
However, Meta's approach differs from these competitors in its commitment to open source, making the technology available for broader development communities without the substantial costs of commercial alternatives.
Furthermore, the company has incorporated safeguards at each layer of model development from pre-training to post-training, including system-level mitigations to protect against potential misuse.
“We aim to develop the most helpful and useful models while protecting against and mitigating the most severe risks,” Meta says.
“We built Llama 4 with the best practices outlined in our Developer Use Guide: AI Protections”.
“As more people continue to use AI to enhance their daily lives, it's important that the leading models and systems are openly available so everyone can build the future of personalised experiences,” the company concludes.
Explore the latest edition of AI Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.
Discover all our upcoming events and secure your tickets today.
AI Magazine is a BizClik brand

