IBM and ElevenLabs: Enterprise AI Finds its Voice

For decades, the promise of voice-driven technology remained tantalisingly out of reach. Early iterations of speech-to-text (STT) and voice recognition systems were often met with user frustration ā characterised by rigid scripts, misunderstood phrases and robotic responses that alienated users.
However, the technology has entered a transformative era where speech has gone from a method of simple dictation to the definitive interface for advanced AI.
Rather than simply transcribing words, modern STT tech acts as the vital gateway for agentic AI, translating the nuance and intent of human speech into actionable digital data in real time.
āAI agents are becoming central to everyday work, and voice is where AI either earns trust or loses it ā
As businesses strive to create deeper, more intuitive connections with both customers and their employees, the ability to accurately interpret the spoken word has emerged as a cornerstone of digital transformation. By bridging the gap between human dialogue and complex backend computational workflows, speech-to-text is shifting the paradigm from text-bound interactions to seamless, voice-first experiences that feel genuinely conversational and inherently human.
Delivering natural voice interactions
Focus in the enterprise has already shifted from static chatbots to autonomous, agentic AI. These sophisticated agents are capable of executing multi-step tasks, collaborating across systems, and automating intricate business processes.
Yet, for agents to truly integrate into everyday work, they must be accessible through the most natural human medium: voice. While long wait times, rigid call flows and robotic-sounding voices have historically diminished the user experience, voice is fast becoming a critical medium for customer and employee-facing agentic AI workflows.
Look no further than the landmark collaboration involving ElevenLabs and IBM, which brings together the former’s Text to Speech (TTS) and Speech to Text (STT) with the latter’s watsonx Orchestrate. This agentic AI orchestration platform provides clients with the advanced tools required to deliver richer, more natural voice interactions.
The partnership is specifically designed to improve agentic AI-driven experiences while simultaneously addressing the security and scalability needs of modern enterprises. By integrating ElevenLabs' premium technology, clients can build security and compliance-focused voice-enabled agents that communicate clearly and naturally, incorporating the nuance, emotion and rhythm of human speech across 70 languages.
“AI agents are becoming central to everyday work, and voice is where AI either earns trust or loses it,” explains Mati Staniszewski, Co-Founder at ElevenLabs. “Together with IBM, we're helping organisations replace robotic interactions with AI agents that people actually want to talk to, built with the security and compliance controls that enterprises require.”
Real-world implications
ElevenLabs and IBMās strategic integration expands agentic capabilities from traditional text-based systems to voice-first interactions, offering organisations premium voice options that help them deliver more effective, human-centred AI experiences.
The real-world implications span numerous sectors. Government agencies and public services, for example, must support several languages to help their constituents with vital information about healthcare, human services, education and civic activities.
With the integration of ElevenLabs, AI phone agents can converse seamlessly in dozens of languages with multiple regional accents and voices.
Additionally, banks, insurance companies, healthcare providers and utilities can provide support to more communities across key use cases including customer support, sales, employee experience and internal operations.
- US$330m ā ElevenLabs' annual recurring revenue (2025)
- 70 ā number of languages across which ElevenLabsā TTS technology can help clients
- 10,000+ ā number of voices in ElevenLabs' library
- US$11bn ā ElevenLabs' valuation as of February 2026 after its US$500m Series D funding round
Orchestration, scale and enterprise governance
IBM watsonx Orchestrate enables clients to build, deploy, manage and govern AI agents to help them automate workflows across their entire business. The platform connects directly to existing systems, models or automation tools, allowing for sophisticated agent collaboration and providing a scalable foundation for trustworthy, explainable enterprise AI.
Through the aforementioned collaboration, clients building agents with IBM watsonx Orchestrate can access ElevenLabs' premium speech quality alongside an extensive library of more than 10,000 voices.
Crucially, the partnership ensures clients do not have to sacrifice protection for performance. Organisations can access enterprise-grade protections ā including PCI compliance for secured payment processing, Zero Retention Mode designed to support HIPAA-compliant data handling and strict data residency controls. The combination helps to address the consistency, security and reliability needed for enterprise-scale deployments, supporting high-volume and highly-concurrent interactions across global user bases.
āWeāre bringing a voice to AI Agents in the enterprise,ā emphasises Nick Holda, Vice President, AI Technology Partnerships at IBM. āAs clients increasingly deploy agentic AI that interacts with their customers and employees, they want these experiences to feel intuitive, responsive and accessible.
āIBM's open ecosystem approach offers clients the flexibility to choose the models and tools that fit their business, and our integration of ElevenLabs into watsonx Orchestrate is a powerful example of that ā enabling enterprises to deploy AI agents that sound natural, scale globally and address security, reliability and governance.ā
The future of human-centred interfaces
The practical applications of robust voice-first AI extend far beyond basic customer support hotlines. Across the corporate world, internal operations and employee experiences are being fundamentally re-imagined.
āWe’re bringing a voice to AI Agents in the enterprise ā
Internal voice agents can assist staff by navigating legacy systems, updating databases or retrieving complex compliance documentation through simple verbal commands. In customer-facing scenarios, banks and utility companies can provide support to more communities across key cases including sales pipelines, troubleshooting service disruptions, and resolving enquiries before they escalate.
Rather than forcing users into rigid call flows, the future lies in adaptable, responsive environments that adjust to the user's natural speaking habits, ensuring interactions feel intuitive, responsive and accessible to global user bases.
Ultimately, the evolution of speech-to-text represents a permanent departure from text-only limitations toward a more empathetic, efficient digital ecosystem.
ElevenLabs and IBM intend to continue their collaboration, helping enterprises move confidently beyond text-only agents and towards voice-first, human-centred AI experiences designed for the enterprise with the ability to scale.

