Skip to main content

https://pds.blog.parliament.uk/2025/02/27/whose-english-gets-to-be-default/

Whose English gets to be default?

A collage with well known British celebrities such as Stormzy, Adele, Brian Cox, Meera Syal, Dame Judi Dench, Grace Dent, Gok Wan, Tom Jones, AJ Odudu, and Sean Bean. In the foreground, a large headline is displayed “Whose English gets to be default?”

I’m Michael and I work as a user researcher. Before coming to PDS I studied for a Masters Degree in Human-Computer Interaction Design. My dissertation was about accent bias in speech recognition technology. In September 2024, I gave a talk on this subject at the PDS Digital Conference, and I am pleased to share a summarised version in this article. If speech recognition seems distant from your own work, I would urge you to consider the insights below. They can inform how we all work with data.

From science fiction to reality

A screenshot of an all-black screen with a colourful horizontal waveform across the middle underneath the words “What can I help you with?”

Humans have long been fascinated by artificial intelligence – stories of machines and beings that can interact and follow our commands have inspired us for centuries. What was once science fiction is now an everyday reality for many of us.

Automated speech recognition technologies use sophisticated type of artificial intelligence technology called natural language processing (NLP). Let’s pause to consider whose English was used to make this technology work. These design choices about the voices and accents used have consequences. So, we should ask: Whose English gets to be default?

Robots

New technologies can sometimes reflect stereotypes. Experimental robot designs from the 1950s were commonly human-like, reflecting a popular vision of a domestic servant.

Modern voice assistants often use female voices to suggest they are efficient and helpful, carrying the gendered stereotype of a helpful, efficient secretary.

These are design decisions, whether intentional or unintentional. So, when we use automated speech recognition technologies, we should ask ourselves: Whose identity was used as default?

Data gaps 

The Library of Missing Datasets by Mimi Onuoha showing a light grey filing cabinet filled with labelled empty files. The hand of a Black person is show taking out one file labelled “Global web user measurements that include VPNs”

One of my interests is conceptual art. I like the way conceptual artists use unusual methods to make us think differently about our relationship to the world around us, including technology.

Mimi Onuoha’s thought-provoking work has made me think very differently about data-driven systems, like speech recognition.

The Library of Missing Datasets is a piece she started in 2016. It is a filing cabinet containing individually labelled files. Each file is empty and represents a data gap because, according to her research, they represent datasets that have never existed. In the artist’s words ‘that which we ignore reveals more than what we give attention to’.

Data gaps exist everywhere. If we can identify them, they often give us hints about those with the least agency and most at risk of data harms. In speech recognition, it means identifying people who are misunderstood. Individuals, communities and cultural groups excluded because their speech does not match what a computer determines is “correct”.

Being misunderstood by a computer is annoying if we can't get our music to stop when it doesn’t recognise our voice. Continued misrecognition can have long-lasting negative outcomes. Life changing decisions can be assisted by calculations made from people’s faces and voices — influencing decisions on employability, trustworthiness, or nationality. It is critical we consider how harm can be produced as a result of misrecognition and ask, Whose language is used against them?

Nationality 

Take a moment to examine the configuration settings in your voice assistant or transcription software. It is likely you have it set to use British English.

A sociolect is a way of speaking within a particular social group, for example an age group, region, or city. Computers have more difficulty with sociolects associated with working class, rural, queer, and minority ethnic communities. These types of speech are often absent from data used in the training of automated speech recognition technologies.

As technologists, we often approach problem-solving by collecting more data. You may think, “Why not collect more types of English language data?”

This could help to a certain point but will only go so far. There are many other considerations.

Our voices constantly change. Few of us spend our whole lives at the same address in which we were born. Our vocabularies adapt as we migrate within and across cities, counties and countries. We interact with and speak to people who speak differently from us every day. We code-switch as we pivot between social settings to signal our belonging. Our voices change as we age. Language changes as years pass.

There is also the important question of ownership of language data. We witness communities keeping control of their datasets, like Te Hiku Media who work with Maori communities in Aotearoa, New Zealand. Or we can take inspiration from Masakhane. This African-led NLP research project works across the continent where over 2,000 languages are spoken. Their work has increased the representation of African languages in automated speech recognition.

Conclusion 

When we ask ourselves Whose English gets to be default?, we draw attention to the many ways of speaking within a single country, region, or city. Many dialects are not treated or represented equally by speech recognition technologies.

These are knotty questions. Seeking a neat answer may frustrate our solution-driven logic. Perhaps we should consider how to live with the discomfort of not knowing where the results of these questions might lead.

We can apply these lessons to many avenues of technology design and implementation beyond AI and speech recognition.

As we examine the origin of the data underpinning our work – What data is missing or skewed? Who might be harmed as a result of this missing data? – the remaining question we return to as a reminder of our critical mindset is “Whose English gets to be default?”

_________________________________

Michael Kibedi is a User Researcher at PDS whose master's dissertation in Human-Computer Interaction Design (City University of London, 2021) explored how accent bias in automated speech recognition technology demonstrates how social exclusion can be reconstructed in digital devices, questioning our relationship to nationhood, gender and racial identity.

This article is adapted from a talk of the same name that Michael gave at the 2024 PDS Digital Conference.

You can read “On talking robots” which is an extended version of this article or you can follow Michael on LinkedIn.

 

Sharing and comments

Share this page