Dubber: overcoming bias in NLP and speech recognition
Natural Language Processing (NLP) is a specific type of AI technology that allows computers to comprehend text and speech, similarly to how a human would.
The practical implications and benefits of this technology are significant to business success. Now, computers are able to process, analyse and understand a wealth of information, not just from written text sources like libraries or the internet, but also from spoken conversations such as business meetings. With this, for example, organisations can have meetings automatically transcribed, saving a huge amount of time and money.
Some examples of how the application of NLP - specifically in - can help a business succeed are through:
Looking specifically at speech-to-text and analysis, there is a number of applications where this type of technology can support businesses.
NLP can analyse the sentiment in customer conversations in real-time, alerting managers to potential conflicts requiring assistance, or suggesting to the operator a different script to follow. It can also gather the most important or impactful topics and themes raised in a meeting and turn them into meeting minutes, including action items.
With the ability to gauge employee engagement in particular projects, the technology can either support staff who are less engaged or offer enablement to those most engaged.
Finally, by determining customer interest levels in different brands, products or services during conversations, businesses can gain market insight through the analysis of speech.
Despite these significant benefits, one key issue with this technology is bias. To learn more, we spoke to Iain McCowan, Director of AI at Dubber who shared his insight.
Why are there issues of bias in NLP and speech recognition? How does it occur?
Modern NLP models are trained on large datasets, so ultimately most bias comes from any bias inherent in the dataset. With the USA being the most significant economy, and the place where many recent NLP technologies have been birthed, historically, there have been significantly more language resources for US English than any other language or dialect. This has meant that most speech and language technologies, to this day, still provide the best accuracy and richest features for US English users. Another bias factor that may be present in data is historical bias, in which older training datasets may reflect outdated society stereotypes or biases.
In recent years, the bias from training data has been addressed through two main approaches:
- Collecting and improving the availability of more varied and up to date language datasets. An example of this is the Common Voice project by Mozilla.
- Focusing research on methods that improve performance for low resource languages, such as so-called self-supervised learning to pre-train NLP models that can be used for a range of end tasks without needing large annotated datasets. This makes it easier to learn models for less common languages where there aren’t many available large annotated datasets. One example is XLM-R.
These approaches can be used to address not just language diversity, but also to reduce bias against any under-represented population.
How does bias manifest itself in speech recognition software? What are the implications?
Bias can occur in speech recognition and NLP within a business in multiple ways, due to natural variations in speech and language usage that arise from factors such as personality, occupation, education, age, gender, etc.
Fundamentally, the most glaring bias issue in current systems is when speech-to-text recognition software simply misunderstands the speaker, and the incorrect words are noted down - or missed completely. For all intents and purposes, if we are trying to do processing and analysis of the spoken word, this speaker has no voice.
There are many reasons that misunderstanding of a speaker can occur:
- If the business environment is multi-language and the business lacks the same breadth of languages in speech recognition systems and NLP models. With 7000+ languages across the globe, to be equitable to all employees, clients, partners, and customers, a business must ensure their NLP solutions provide equitable coverage of all languages spoken within and across the bounds of their organisation.
- Regionally, there are different dialects and accents that are unique, from US to UK English, Spanish as a second language speakers, African-American Vernacular English, and even differences between states in Australia. The models used must accommodate for all these to accurately understand what was said by all speakers.
- Different gendered speakers, and also children and elderly speakers may also sound different, so models must include these.
- Speakers with speech impairments can be misunderstood, so models must be trained to accommodate these people.
What is needed to overcome and regulate bias in NLP technology?
For NLP technology providers, there is a need to improve the diversity in underlying training data, as well as increase research into methods to improve performance for low resource scenarios, such as under-represented languages, dialects, ages or genders.
Businesses that are clients of NLP technology can influence this by insisting performance reporting is done across a range of these demographic factors and by selecting NLP technology vendors based on their accuracy across the board.
To reduce bias for their employees and customers, businesses should implement continual monitoring of NLP accuracy to ensure they are always using the best set of NLP technologies to suit their circumstances. This is not just a one-time decision - continual monitoring is required. For instance, the software can be used to alert on circumstances where a particular speaker is not making sense in the speech-to-text - a bias is occurring. By evaluating the speaker carefully, they can then seek to implement new models to accommodate them - and other people like them. A simple way this is often achieved in speech recognition is for employees to read a script during onboarding so that the technology can better learn their individual voices.
Another recent approach to effectively addressing bias is through NLP systems that incorporate active learning loops. This active learning approach allows companies, teams or individual users to provide positive or negative feedback on NLP outputs that allow the models to then adaptively improve for their specific usage over time. This can mean each individual can control and improve the accuracy of systems for their own usage, giving them the power to eliminate negative bias.
By building in activities and automation that help remove bias in NLP systems, businesses can ensure they are engaging ethically with all speakers, allowing everyone to be truly heard. Sharing voice data sets publicly, stripped of proprietary information, can also help evolve language models to be more inclusive.