the-ai-interview

The AI Interview: Alice Xiang, Sony

Alice Xiang, Global Head of AI Governance at Sony, explains why responsible AI demands a deeper focus on training data – not just outputs
WRITTEN BY
The AI Interview: Alice Xiang, Sony
the-ai-interview

The AI Interview: Alice Xiang, Sony

Alice Xiang, Global Head of AI Governance at Sony, explains why responsible AI demands a deeper focus on training data – not just outputs
WRITTEN BY
The AI Interview: Alice Xiang, Sony
Share this article
Prioritise Us on Google
Share this article
Alice Xiang, Global Head of AI Governance at Sony, explains why responsible AI demands a deeper focus on training data – not just outputs

Alice Xiang did not necessarily set out to become one of the technology sector's leading voices on AI ethics. Instead, her path began with statistics and economics, studying both subjects at Harvard University. She completed a Bachelor's and Master's degree, before earning a further Master's in economics from the University of Oxford.

However, early on, Alice’s primary interest was in applying empirical methods to human-centric data. This drew her naturally towards machine learning, a branch of AI in which systems learn patterns from data rather than following explicit instructions.

Working on early ML models, Alice noticed something troubling: few practical frameworks existed for evaluating bias in AI systems.

“Early in my career, I worked on developing machine learning models and quickly saw the lack of standards or guardrails around bias in AI systems,” she reflects. “That realisation made me want to focus on building more responsible technology.”

Alice Xiang serves as Global Head of AI Governance at Sony Group

Today, Alice holds two senior roles at Sony, the Japanese conglomerate known for its electronics, entertainment and gaming businesses, alongside its growing AI research division, Sony AI.

As Global Head of AI Governance at Sony Group, she oversees the policies and frameworks guiding AI use across the company's many business units worldwide. Meanwhile, as Lead Research Scientist at Sony AI, she leads a team focused on responsible AI for creative industries and the protection of creator rights.

Sony AI was established in 2020 to accelerate fundamental AI research and development while supporting human imagination and creativity. Ethics has been central to that mission from the outset. Alice’s research concentrates specifically on the quality and fairness of the data used to train AI models.

Key Figures
  • April 2020 – date Sony AI was established
  • US$83.4bn – Sony’s 2025 revenue
  • 112,000+ – number of Sony Group employees worldwide
Sony is considered a giant in the realm of consumer electronics. Picture: Getty Images

Why data matters more than outputs

Public debate around AI ethics often focuses on what a system produces – whether that is a chatbot's response or the output from an image generator. Output, Alice argues, misses where the real problems begin.

“AI outputs are the most visible part of the system,” she explains. “When something goes wrong, it is the output that users experience directly, so scrutiny naturally begins there."

Alice contends that the deeper issue lies further upstream – in the datasets used to build these systems in the first place.

“These issues almost always originate in the data,” she continues. “Training data forms the foundation of every AI system, and if that foundation is biased or unrepresentative, those flaws will appear in the model's behaviour.”

I quickly saw the lack of standards or guardrails around bias in AI systems
Alice XiangGlobal Head of AI Governance at Sony

Part of the problem, according to Alice, is a lack of clear standards and benchmarks for collecting data responsibly.

“In fields such as computer vision,” she says, “technical innovation has moved faster than ethical guidance. This has resulted in datasets that may lack diversity, reflect societal biases or be collected without adequate consent.”

Alice points to verification systems and facial-recognition technology as areas where fairness and accuracy carry real-world consequences, adding: “Without high-quality, responsibly-sourced datasets, even the most advanced models will reproduce the limitations of the data they were built on.”

Youtube Placeholder

The hidden cost of scraped data

Many large AI datasets have historically been assembled through web scraping – in other words, the automated collection of images, text or other content from across the internet.

However, Alice points out that this often takes place “without the consent, awareness or compensation of the individuals whose data is included”. 

She adds: “This raises important ethical concerns, but it also creates technical risks.”

Those technical risks include misrepresentation, embedded cultural stereotypes and structural biases that become baked into a model long before it ever reaches a user.

As AI is deployed in high-stakes settings, from healthcare to law enforcement, the danger of these inherited flaws grows more acute.

Alice notes: “Models trained on scraped data may reinforce discriminatory patterns, amplify privacy violations or produce errors that disproportionately affect marginalised groups.”

And yet, despite mounting regulatory scrutiny and legal challenges over data scraping, many organisations continue the practice – simply because it is “inexpensive and familiar”. 

Alice Xiang, Global Head of AI Governance at Sony

Building a better alternative

Alice’s team has tried to demonstrate that a different approach is possible, through a project called FHIBE (Fair Human-Centric Image Benchmark).

FHIBE is a dataset designed from the outset around consent, with images collected directly from participants rather than scraped from the web. It also aims for broader demographic representation, allowing researchers to test how AI systems perform across different groups of people.

Alice says: “Benchmarks such as FHIBE allow practitioners to assess bias more systematically and design systems that perform well across a wider range of users and contexts.”

Training data forms the foundation of every AI system
Alice XiangGlobal Head of AI Governance at Sony

Building such a dataset is far more demanding than scraping images from the internet, requiring multiple disciplines from within an organisation to join forces.

“It requires close coordination across teams such as legal, privacy, technical and operations, as well as meaningful engagement with data subjects,” Alice continues. “This is inherently far more complex than simply scraping data from the web.”

The payoff, Alice believes, justifies the effort: “Ethical data practices strengthen trust with customers and partners, reduce regulatory and legal risk, and improve model performance through higher quality, more representative and better annotated datasets.”

Sony is one of the most recognisable names in technology. Picture: Getty Images

Why adoption remains slow

So, if FHIBE proves that responsible data collection works, why haven’t more AI companies followed suit? 

Alice identifies two barriers. The first relates to the sheer scale of resources required. For Sony, building FHIBE involved researchers, engineers, legal and privacy specialists, IT staff and quality assurance teams, not to mention substantial time and money.

“For companies accustomed to rapid iteration, the effort required to build consensually-sourced and globally-representative datasets can feel overwhelming,” she adds.

The second barrier is visual diversity, creating what Alice calls a “perceived trade-off” between ethical sourcing and dataset breadth. 

“Consensually-collected datasets often have less visual variety than scraped datasets simply because the web contains such a range of images,” she explains.

Ultimately, Alice believes the aforementioned trade-off is a question of infrastructure and investment, as opposed to technical limitations. 

As regulation tightens amid shifting public expectations of AI, she is certain the pressure to build responsibly-sourced datasets at scale will only intensify.

Company portals

Executives