Article

AI Applications

The Story Behind Elon Musk’s xAI Grok 4 Ethical Concerns

By Kitty Wheeler

July 17, 2025

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

OpenAI and Anthropic staff criticise Elon Musk’s company for failing to publish safety reports on frontier AI model Grok 4

AI safety researchers from OpenAI, Anthropic and the wider industry have criticised Elon Musk’s xAI for its lack of safety measures on AI chatbot Grok 4

Most industries grow over time and their regulations grow with them. But AI is different – it’s a brand-new field that’s developing faster than anything else.

Because AI is changing so quickly, it’s becoming harder and harder for laws and ethical guidelines to keep up.

So far, AI regulations and safety protocols include rigorous testing of AI systems before release, documentation of potential risks and publication of detailed safety reports that allow peer review within the research community.

This consensus has been built through hard-won experience. Early AI deployments occasionally produced embarrassing or harmful outputs, leading companies to adopt more cautious approaches.

The industry’s largest players have generally embraced these practices, albeit with varying degrees of consistency and transparency.

However, this fragile consensus now faces its most serious challenge. Researchers from leading AI companies OpenAI and Anthropic have publicly condemned the safety practices at xAI, the AI startup owned by Elon Musk.

The criticism centres on what they describe as “reckless” and “completely irresponsible” approaches to AI safety testing and documentation.

Why OpenAI’s researcher condemns xAI’s safety approach

The outcry follows a series of incidents involving xAI’s AI chatbot Grok, which provides conversational AI services.

xAI is Elon Musk’s startup developing advanced conversational AI systems that advocates these five pillars | Photo: xAI Linkedin

Last week, the system generated antisemitic content and repeatedly referred to itself as “MechaHitler” before being temporarily taken offline.

The company subsequently launched Grok 4, a frontier AI model – which was found to incorporate Musk’s personal political views when responding to contentious topics.

Boaz Barak, a Computer Science Professor | Credit: Boaz Barak

Boaz Barak, a Computer Science Professor on leave from Harvard University who works on safety research at OpenAI, says in a post on X: “I appreciate the scientists and engineers @xai but the way safety was handled is completely irresponsible.”

Boaz’s primary concern relates to xAI’s decision not to publish system cards – industry standard reports that detail training methods and safety evaluations.

These documents are typically shared with the research community to promote transparency about AI development processes.

“It’s unclear what safety training was done on Grok 4,” he says.

OpenAI and Google have themselves faced criticism for delayed publication of safety reports.

OpenAI chose not to publish a system card for its GPT-4.1 model, claiming it was not a frontier model.

Google waited months after unveiling Gemini 2.5 Pro, its advanced AI system, before publishing safety documentation.

However, both companies have historically published safety reports for frontier AI models before full deployment.

Anthropic researcher calls xAI’s practices “reckless”

Samuel Marks, an AI Safety Researcher at Anthropic, has also criticised xAI’s approach.

Samuel Marks, an AI Safety Researcher at Anthropic | Credit: X

“Anthropic, OpenAI and Google’s release practices have issues,” he says.

“But they at least do something, anything to assess safety pre-deployment and document findings – xAI does not.”

The lack of transparency has made it difficult for the AI research community to assess Grok 4’s safety measures.

An anonymous researcher claimed in a post on LessWrong, an online forum focused on AI safety, that Grok 4 has no meaningful safety guardrails based on their testing.

The company has not published the results of its safety evaluations to allow independent verification of these claims.

Dan Hendrycks, a Safety Adviser for xAI and Director of the Center for AI Safety

Dan Hendrycks, a Safety Adviser for xAI and Director of the Center for AI Safety, says in a post on X that the company conducted “dangerous capability evaluations” on Grok 4.

These assessments test whether AI systems can perform potentially harmful tasks.

However, xAI has not made the results of these evaluations publicly available.

Broader industry concerns over AI safety standards

The controversy has highlighted broader concerns about consistency in AI safety practices across the industry.

Steven Adler, an Independent AI Researcher who previously led safety teams at OpenAI, tells Tech Crunch: “It concerns me when standard safety practices aren’t upheld across the AI industry, like publishing the results of dangerous capability evaluations.

“Governments and the public deserve to know how AI companies are handling the risks of the very powerful systems they say they’re building.”

The timing of the criticism is particularly notable given Elon Musk’s history as an advocate for AI safety.

The billionaire – who also runs Tesla, the electric vehicle manufacturer and SpaceX, the aerospace company – has repeatedly warned about the potential for advanced AI systems to cause catastrophic outcomes for humans.

He has additionally advocated for open approaches to AI development.

Recent incidents with Grok have also extended beyond the initial antisemitic content, bringing up topics such as “white genocide” in conversations with users.

These behavioural issues have emerged as xAI seeks to expand Grok’s integration into Tesla vehicles and market its AI models to the Pentagon and enterprise customers.

Regulatory pressure building on AI safety reporting

The controversy may strengthen arguments for regulatory intervention in AI safety practices.

California state Senator Scott Wiener | Credit: Scott Wiener

California state Senator Scott Wiener is advancing legislation that would require leading AI laboratories, potentially including xAI, to publish safety reports.

New York Governor Kathy Hochul is considering similar measures.

Boaz has also raised concerns about xAI’s AI companions, which include what he describes as taking “the worst issues we currently have for emotional dependencies and trying to amplify them”.

These concerns relate to documented cases of individuals developing concerning relationships with AI chatbots.

The incidents have occurred despite xAI’s rapid technical progress in developing frontier AI models that compete with technology from OpenAI and Google, achievements made just two years after the startup’s founding.

The company has addressed some issues through modifications to Grok’s system prompt - the initial instructions that guide an AI system’s behaviour.

“AI models today have yet to exhibit real-world scenarios in which they create truly catastrophic harms, such as the death of people or billions of dollars in damages,” Steven says.

“However, many AI researchers say that this could be a problem in the near future given the rapid progress of AI models and the billions of dollars Silicon Valley is investing to further improve AI.”

Company portals

OpenAI