Alibaba Cloud Expands AI Portfolio with Qwen2.5 Release

Share this article
Share this article
Prioritise Us on Google
Alibaba Cloud has released two updates to its Qwen language model series, introducing multimodal capabilities and expanded context processing
Chinese technology group Alibaba Cloud upgrades multimodal capabilities and introduces million-token context window in latest LLM offering

The race to develop LLMs with expanded capabilities has intensified as technology companies seek to combine text processing with visual understanding and longer context windows.

Amid this growing competition, Alibaba Cloud – the cloud computing subsidiary of Chinese e-commerce group Alibaba – has released two updates to its Qwen language model series, introducing multimodal capabilities and expanded context processing. These developments position the company in competition with the latest models from US AI companies like OpenAI and Anthropic and Chinese competitors like DeepSeek, as enterprises seek AI solutions that can process both text and visual inputs.

Dongliang Guo, Vice President of International Business, Head of International Products and Solutions at Alibaba Cloud Intelligence

“Alibaba Cloud is committed to delivering real value to global developers through cutting-edge AI models, enhanced cloud infrastructure, and accessible support programs,” says Dongliang Guo, Vice President of International Business, Head of International Products and Solutions at Alibaba Cloud Intelligence. “Together, we aim to spark more AI-driven innovations, benefiting startups, enterprises, and industries altogether across the globe.”

Qwen2.5-VL brings multimodal capabilities to Alibaba Cloud portfolio

The company's Qwen2.5-VL visual-language model expands on its predecessor with parameter sizes ranging from 3 billion to 72 billion. The technology combines text and visual processing to analyse images, charts and video content.

Youtube Placeholder

The model processes video content exceeding one hour in duration and identifies specific time segments for queries. This capability enables users to search within video content and extract information from specific moments.

A core feature of Qwen2.5-VL is its structured data output functionality. According to Alibaba Cloud, the system converts unstructured content from documents such as invoices and forms into organised data formats including JSON, a text-based data structure used in software development.

The technology includes parsing and localisation functions that enable it to operate as a visual assistant for computer and mobile device tasks. These capabilities extend to practical applications such as weather checks and flight bookings through application interfaces.

Alibaba says its flagship model Qwen2.5-VL-72B-Instruct achieves competitive performance in a series of benchmarks

The flagship Qwen2.5-VL-72B-Instruct model is accessible through the Qwen Chat platform. The model demonstrates capabilities in document reading, diagram interpretation, and visual question answering across sectors including education and mathematics.

Alibaba Cloud introduces million-token context with Qwen2.5-1M

Alibaba Cloud has also announced it has launched Qwen2.5-1M, a version of its language model capable of processing up to one million tokens. Tokens are the basic units of text that language models process, with each token typically representing a word or part of a word.

The release includes two instruction-tuned versions with seven billion and 14 billion parameters respectively. These models are available through Hugging Face, an AI development platform used by researchers and companies.

Alibaba Cloud has published an inference framework on Github, the software development platform, to support the deployment of Qwen2.5-1M. The framework utilises length extrapolation and sparse attention, technical approaches that reduce the computational resources required for processing long text inputs.

The framework processes million-token inputs at speeds between three and seven times faster than conventional methods, according to the company's technical documentation.

Key facts
  • Qwen2.5-1M processes up to 1 million tokens in a single context window
  • New framework processes inputs 3-7x faster than conventional methods
  • Qwen2.5-VL offers versions from 3 billion to 72 billion parameters

The development of the Qwen2.5-1M series involved techniques including long data synthesis and progressive pre-training. These methods aim to enhance the model's ability to handle extended context while managing computational requirements.

The release responds to increasing demand for language models capable of processing longer text inputs. Extended context processing enables applications including document analysis and generation of long-form content.

Both Qwen2.5-VL and Qwen2.5-1M are available through open-source channels including Hugging Face and Model Scope, Alibaba’s development community platform.


Explore the latest edition of Technology Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.

Discover all our upcoming events and secure your tickets today.


Technology Magazine is a BizClik brand

Company portals