As the use of artificial intelligence (AI) and machine learning (ML) expands across the world, there is an increasing number of connected devices for people and businesses to choose from.
Becoming key to these devices is speech recognition. Divided into two parts, the global speech recognition market is projected to reach almost USD$30bn by 2026.
The two factions of this field are made up of the phonetic recognition of different words and the interpretation of the actual language as spoken; this is known as natural language processing.
Speech recognition is mostly used in digital assistants. These assistants are run on a number of devices such as smartwatches, smartphones and music speakers.
Speech-to-text is another way this technology comes into play. Essential to many businesses, this technology translates recordings into written transcripts automatically.
Looking into the top companies in this field, AI Magazine looks at companies operating both in speech recognition and speech-to-text technology.
Founded in India, Uniphore is a conversational AI technology company that enables businesses to deliver transformation customer service across different touchpoints. The company’s platform can recognise and understand language, dialect, emotion and intent in real-time. It also has the capacity to interpret over 100 languages. Some of the company’s customers include Accenture, DHL, Genpact and UPS.
Since its founding in 1994, Sensory has been utilising ML technologies, including voice and natural language processing, to enhance user experience. With offices in Portland, Boulder, Tokyo, Seoul and Hong Kong, Sensory offers a neural network method for embedded speech recognition mostly in consumer electronics. The company’s technology also provides consumers with the convenience of natural language for voice control and feature access in a simple and intuitive way without requiring private data to be sent to the cloud.
Headquartered in Hefei Anhui, China, iFLYTEK is an advanced enterprise dedicated to the research and development of advanced technologies. These technologies include intellectual speech and language technologies and speech information services. Founded in 1999, the company actively promotes the development of AI products in its sector-based applications with the vision to enable machines to listen, speak, understand and think.
Nuance Communications provides speech recognition and AI products that focus on server and embedded speech recognition, telephone call steering system, automated telephone directory services and medical transcription software and systems. Providing conversational AI for industries including healthcare, financial services, telecom and retail, Nuance’s solutions transform the way people work, connect and interact with each other. The company was acquired by Microsoft last year to bolster the company’s solution portfolio for the healthcare industry.
Developers of Chinese voice recognition, Mobvoi was founded in 2012 and its core focus is on AI interaction and hardware-software integration. The company provides B2B and B2C AI products and services to more than 40 countries and regions. Mobvoi’s proprietary AI technology establishes an end-to-end human-computer interaction system architecture including speech signal processing, wake words, speech recognition, natural language understanding, dialogue management, vertical search and speech synthesis.
Verbit is a fast-growing AI startup focused on building cutting edge transcription technology. The company is dedicated to providing universities and businesses with the tools they need to make all video media accessible. Its range of AI-powered solutions includes live captioning and real-time transcription, transcription for legal processing, audio and video captioning and transcription as well as audio description. Verbit offers customers high-quality, word-for-word, interactive and collaborative transcripts and captions.
California-based company, Otter.ai offers an AI transcription service to capture, search and share meetings, lectures and live events. This AI-powered assistant means users can automatically transcribe notes from recorded events, with its new release, Otter 2.0, the company has added more functionality to improve collaboration and productivity. Once transcribed, users can search, play and edit the notes generated.
03: SoapBox Labs
With its speech recognition technology, SoapBox Labs powers joyful learning and play experiences for children. The company recently released a new feature that gives educators unprecedented insight into their students’ oral reading fluency. This technology has been built specifically for children to empower them to use their voices to shape the world around them. Its high performing speech recognition software for children utilises AI and ML to provide high-quality speech recognition and its engine has been independently validated as the most accurate available anywhere.
Technology giant, Amazon, develops speech recognition products and offers a speech-to-text service through Amazon Web Services (AWS), Amazon Transcribe. Amazon Transcribe automatically converts speech to text allower users to extract key business insights from customer calls, improve business outcomes with state of the art speech recognition and enhance accuracy with custom models that understand your domain-specific vocabulary. Amazon’s Alexa products have exploded since the first release of its smart speaker, now the speech recognition technology is built into a number of products including its Fire Sticks, TVs, smart cameras and doorbells.
Speechmatics is pushing the boundaries of automatic speech recognition in over 31 languages. The company’s flexible API easily integrates into its customers’ services, solutions, and applications to give them the most accurate transcription powered by AI speech recognition. Last year, Speechmatics launched its new software that uses the latest techniques in deep learning and contains the company’s breakthrough self-supervised models. This software also outperformed Amazon, Apple, Google and Microsoft and delivers similar improvements in accuracy across accents, dialects, age, and other sociodemographic characteristics.