Sustainable AI: The Rise of Small Language Models

As AI becomes integral to modern enterprises, the demand for increased computing capability spurs rapid advancements, infrastructure enhancement and skill evolution.
Large language models (LLMs) developed by giants like OpenAI, Anthropic and Google have grabbed the spotlight due to their prowess in processing and producing natural language, fulfilling roles from corporate chat assistants to sophisticated analytics solutions.
However, these models require vast resources, notably in terms of energy and water.
A single ChatGPT inquiry might use up to tenfold the electricity of a typical Google search and the data centres powering these models can consume millions of gallons of water for cooling purposes.
The magnitude is dramatic: training GPT-3 demanded around 1,287 MWh of electricity, comparable to powering one hundred and twenty US homes annually.
Microsoft’s water usage soared by 34% in 2022, mostly due to its AI activities. In this context, small language models (SLMs) have emerged.
Offering robust capabilities with a much lighter footprint, SLMs have gained traction as a highly efficient alternative.
So, in what ways are major global corporations leveraging SLMs within their sustainability and operational frameworks?
Understanding SLMs and LLM’s
SLMs, in contrast to their large counterparts, operate on a scale from several million to roughly 10bn parameters.
This results in notable reductions in memory, processing and storage needs.
While they share the same transformer architectures as larger models, techniques like knowledge distillation, pruning and quantisation help them achieve impressive task-specific performance with a fraction of the resources.
SLMs, by employing domain-specific training datasets, thrive in focused applications such as corporate email summarisation and call centre query resolution.
LLMs like GPT-4 and Gemini are comprehensive neural networks, trained on extensive datasets comprising a vast array of the world’s digital text.
These models, featuring up to trillions of learnable parameters, exhibit high levels of fluency in language, reasoning, summarisation, coding and more.
Their advantage lies in versatility and comprehensive ability, making them suitable for a variety of tasks from legal documentation analysis to creative tasks like poetry.
Nonetheless, managing LLMs requires substantial computational power, coordination across specialised hardware and continuous internet connectivity, contributing to greater costs and environmental impacts.
Why SLMs are a sustainable choice
SLMs’ reduced energy usage is a central element in their adoption for sustainable strategies.
As these models decrease in size, their energy needs for training and deployment decrease, allowing companies to expand their use of intelligent services while staying within emissions goals.
Unlike LLMs, SLMs can be utilised on edge devices or minimal on-premises setup, minimising reliance on energy-heavy central data facilities.
Aligned with the Green AI initiative, SLMs emphasise efficiency, eco-friendliness and inclusivity.
Their financial viability is attractive to businesses scaling AI infrastructure, offering lower infrastructure costs and quicker fine-tuning with minimal GPU needs.
On top of being eco-friendly and cost-efficient, SLMs are more manageable for scrutiny and control, with their compact structures allowing quicker analysis, debugging and risk management.
This transparency is especially beneficial in regulated sectors such as healthcare and finance, where quick model explainability is paramount.
SLMs also enhance operational flexibility, enabling on-device deployment, on-premises servers or cloud solutions based on latency, privacy or regulatory requirements.
This adaptability allows companies to choose the optimal location for AI workloads, overcoming cloud-only limitations on bandwidth and privacy.
The impact of Microsoft’s Phi-4 innovations
“The energy intensity of advanced cloud and AI services has driven us to accelerate our efforts to drive efficiencies and energy reductions,” says Melanie Nakagawa, Microsoft’s Chief Sustainability Officer.
“As AI scenarios increase in complexity, we’re empowering developers to build and optimise AI models that can achieve similar outcomes while requiring fewer resources.”
Microsoft is advancing with the Phi-4 model under its SLM lineup. Available through Azure AI Foundry, HuggingFace and the Nvidia API Catalog, the Phi-4 lineup features versions like Phi-4-multimodal and Phi-4-mini.
- 5.6B - Parameters in the Phi-4-multimodal model, fewer than most competing multimodal systems
- 6.14% - Word error rate on the Huggingface OpenASR leaderboard, representing a new benchmark record
- 128,000 - Maximum token sequence length supported by the Phi-4-mini model, enabling processing of extensive text
Phi-4-multimodal handles speech, vision and text, setting benchmarks in speech recognition and translation with a 6.14% word error rate on the HuggingFace OpenASR leaderboard.
“Phi-4-multimodal marks a new milestone in Microsoft’s AI development as our first multimodal language model,” says Weizhu Chen, Technical Fellow, CVP, Gen AI at Microsoft.
“By leveraging advanced cross-modal learning techniques, this model enables more natural and context-aware interactions, allowing devices to understand and reason across multiple input modalities simultaneously.”
The compact Phi-4-mini model with 3.8bn parameters excels in rapid tasks like reasoning, math and code generation and supports extensive token sequences, making it suitable for processing lengthy documents.
Combining high accuracy and scalability, these models suit analytical tasks in environments with limited resources, enhancing sustainability with local on-device processing.
IBM’s sustainable AI solutions
Another AI leader, IBM, builds on extensive AI research with its Granite 3.2 model family. Specifically for business use, these models provide robust language capabilities without the massive overhead larger models demand.
The Granite 3.2 series incorporates “chain of thought” reasoning, offering step-by-step problem-solving and allowing resource savings with on-demand advanced logic.
“The next era of AI is about efficiency, integration and real-world impact – where enterprises can achieve powerful outcomes without excessive spend on compute,” says Sriram Raghavan, Vice President of IBM AI Research.
IBM’s Granite Vision 3.2 2B model stands out for enterprise document processing, effectively extracting and classifying data from over 85m PDFs using IBM’s Docling toolkit.
The Granite Guardian 3.2 safety model and TinyTimeMixers forecaster showcase IBM’s commitment to sustainable and high-performing AI solutions.
Looking forward, both LLMs and SLMs continue to evolve, forecasting a future marked by strategic hybridisation, model efficiency and adaptive infrastructure.
Hybrid architectures will enable organisations to combine the broad competencies of remote LLMs with the targeted efficiency of locally deployed SLMs, maintaining a balance between sustainability, privacy and speed.
In the coming years, AI’s success will increasingly focus on how efficiently and responsibly models can operate, aligning with societal goals for environmental responsibility and technological fairness.
As Melanie says: “Sustainability is good business. Sustainable business practices drive innovation.”



