Alibaba Cloud Expands AI Portfolio with Qwen2.5 Release

By Marcus Law

January 29, 2025

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Alibaba Cloud has released two updates to its Qwen language model series, introducing multimodal capabilities and expanded context processing

Chinese technology group Alibaba Cloud upgrades multimodal capabilities and introduces million-token context window in latest LLM offering

The race to develop LLMs with expanded capabilities has intensified as technology companies seek to combine text processing with visual understanding and longer context windows.

Amid this growing competition, Alibaba Cloud – the cloud computing subsidiary of Chinese e-commerce group Alibaba – has released two updates to its Qwen language model series, introducing multimodal capabilities and expanded context processing. These developments position the company in competition with the latest models from US AI companies like OpenAI and Anthropic and Chinese competitors like DeepSeek, as enterprises seek AI solutions that can process both text and visual inputs.

Dongliang Guo, Vice President of International Business, Head of International Products and Solutions at Alibaba Cloud Intelligence

“Alibaba Cloud is committed to delivering real value to global developers through cutting-edge AI models, enhanced cloud infrastructure, and accessible support programs,” says Dongliang Guo, Vice President of International Business, Head of International Products and Solutions at Alibaba Cloud Intelligence. “Together, we aim to spark more AI-driven innovations, benefiting startups, enterprises, and industries altogether across the globe.”

Qwen2.5-VL brings multimodal capabilities to Alibaba Cloud portfolio

The company's Qwen2.5-VL visual-language model expands on its predecessor with parameter sizes ranging from 3 billion to 72 billion. The technology combines text and visual processing to analyse images, charts and video content.

The model processes video content exceeding one hour in duration and identifies specific time segments for queries. This capability enables users to search within video content and extract information from specific moments.

A core feature of Qwen2.5-VL is its structured data output functionality. According to Alibaba Cloud, the system converts unstructured content from documents such as invoices and forms into organised data formats including JSON, a text-based data structure used in software development.

The technology includes parsing and localisation functions that enable it to operate as a visual assistant for computer and mobile device tasks. These capabilities extend to practical applications such as weather checks and flight bookings through application interfaces.

Alibaba says its flagship model Qwen2.5-VL-72B-Instruct achieves competitive performance in a series of benchmarks

The flagship Qwen2.5-VL-72B-Instruct model is accessible through the Qwen Chat platform. The model demonstrates capabilities in document reading, diagram interpretation, and visual question answering across sectors including education and mathematics.

Alibaba Cloud introduces million-token context with Qwen2.5-1M

Alibaba Cloud has also announced it has launched Qwen2.5-1M, a version of its language model capable of processing up to one million tokens. Tokens are the basic units of text that language models process, with each token typically representing a word or part of a word.

The release includes two instruction-tuned versions with seven billion and 14 billion parameters respectively. These models are available through Hugging Face, an AI development platform used by researchers and companies.

Alibaba Cloud has published an inference framework on Github, the software development platform, to support the deployment of Qwen2.5-1M. The framework utilises length extrapolation and sparse attention, technical approaches that reduce the computational resources required for processing long text inputs.

The framework processes million-token inputs at speeds between three and seven times faster than conventional methods, according to the company's technical documentation.

Key facts

Qwen2.5-1M processes up to 1 million tokens in a single context window
New framework processes inputs 3-7x faster than conventional methods
Qwen2.5-VL offers versions from 3 billion to 72 billion parameters

The development of the Qwen2.5-1M series involved techniques including long data synthesis and progressive pre-training. These methods aim to enhance the model's ability to handle extended context while managing computational requirements.

The release responds to increasing demand for language models capable of processing longer text inputs. Extended context processing enables applications including document analysis and generation of long-form content.

Both Qwen2.5-VL and Qwen2.5-1M are available through open-source channels including Hugging Face and Model Scope, Alibaba’s development community platform.

Explore the latest edition of Technology Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.

Discover all our upcoming events and secure your tickets today.

Technology Magazine is a BizClik brand

Company portals

Alibaba Group

Alibaba Cloud Expands AI Portfolio with Qwen2.5 Release

Qwen2.5-VL brings multimodal capabilities to Alibaba Cloud portfolio

Alibaba Cloud introduces million-token context with Qwen2.5-1M

Company portals

Alibaba Group

Tags