OpenAI helps spot AI text before it gets used for cheating

OpenAI’s AI Text Classifier aims to spot content generated by AI platforms before it can be used by bad actors, but the company admits it's not perfect

Researchers at OpenAI have developed a classifier to spot content generated by artificial intelligence that could be put to use in disinformation or cybercrime activities.

OpenAI researchers say the classifier  has undergone evaluations on a set of English texts and has achieved a 26% accuracy rate in correctly identifying AI-generated text as "likely AI-written." It has also shown a 9% false positive rate, labelling human-written text as AI-generated. The classifier's reliability improves as the length of the input text increases, and it has demonstrated improvement over the previous classifier in its accuracy on more recent AI systems.

“We recognise that identifying AI-written text has been an important point of discussion among educators, and equally important is recognising the limits and impacts of AI-generated text classifiers in the classroom,” explain OpenAI’s Jan Hendrik Kirchner, Lama Ahmad, Scott Aaronson, and Jan Leike in a blog post.  “We have developed a preliminary resource on the use of ChatGPT for educators, which outlines some of the uses and associated limitations and considerations. While this resource is focused on educators, we expect our classifier and associated classifier tools to have an impact on journalists, mis/dis-information researchers, and other groups.”

OpenAI has made this classifier publicly available to gather feedback and determine its usefulness but emphasises it should not be used as the sole method of determining the origin of the text, but rather as an addition to other means of identification.

New classifier struggles with shorter texts

The classifier has low reliability on texts below 1,000 characters, and even longer texts may be wrongly matched. There have been instances where human-written text was incorrectly and confidently identified as AI-written. It is advised to only use the classifier for English text as it has performed poorly in other languages and on code.

It also cannot reliably identify highly predictable text, say researchers. For example, a list of the first 1,000 prime numbers would always have the same answer and cannot be differentiated between AI or human-written. 

OpenAI says the classifier is a language model that has been fine-tuned on a dataset consisting of pairs of human-written text and AI-written text that address the same topic. The dataset was gathered from various sources believed to be written by humans, including the pretraining data and human demonstrations on prompts submitted to InstructGPT, say researchers.

The text pairs were divided into prompts and responses, and responses were generated from multiple language models trained by OpenAI and other organisations. In the web app, the confidence threshold has been adjusted to keep the rate of false positives low. This means that text will only be marked as "likely AI-written" if the classifier is very confident in its prediction.


Featured Articles

Should Tech Leaders be Concerned About the Power of AI?

With insights from Blackstone CEO Steve Schwarzman, we consider if tech leaders are right to be anxious about AI innovation and if regulation is necessary

Andrew Ng Joins Amazon Board to Support Enterprise AI

In the wake of Andrew Ng being appointed Amazon's Board of Directors, we consider his career from education towards artificial general intelligence (AGI)

GPT-4 Turbo: OpenAI Enhances ChatGPT AI Model for Developers

OpenAI announces updates for its GPT-4 Turbo model to improve efficiencies for AI developers and to remain competitive in a changing business landscape

Meta Launches AI Tools to Protect Against Online Image Abuse

AI Applications

Microsoft in Japan: Investing in AI Skills to Boost Future

Cloud & Infrastructure

Microsoft to Open New Hub to Advance State-of-the-Art AI

AI Strategy