OpenAI helps spot AI text before it gets used for cheating

OpenAI’s AI Text Classifier aims to spot content generated by AI platforms before it can be used by bad actors, but the company admits it's not perfect

Researchers at OpenAI have developed a classifier to spot content generated by artificial intelligence that could be put to use in disinformation or cybercrime activities.

OpenAI researchers say the classifier  has undergone evaluations on a set of English texts and has achieved a 26% accuracy rate in correctly identifying AI-generated text as "likely AI-written." It has also shown a 9% false positive rate, labelling human-written text as AI-generated. The classifier's reliability improves as the length of the input text increases, and it has demonstrated improvement over the previous classifier in its accuracy on more recent AI systems.

“We recognise that identifying AI-written text has been an important point of discussion among educators, and equally important is recognising the limits and impacts of AI-generated text classifiers in the classroom,” explain OpenAI’s Jan Hendrik Kirchner, Lama Ahmad, Scott Aaronson, and Jan Leike in a blog post.  “We have developed a preliminary resource on the use of ChatGPT for educators, which outlines some of the uses and associated limitations and considerations. While this resource is focused on educators, we expect our classifier and associated classifier tools to have an impact on journalists, mis/dis-information researchers, and other groups.”

OpenAI has made this classifier publicly available to gather feedback and determine its usefulness but emphasises it should not be used as the sole method of determining the origin of the text, but rather as an addition to other means of identification.

New classifier struggles with shorter texts

The classifier has low reliability on texts below 1,000 characters, and even longer texts may be wrongly matched. There have been instances where human-written text was incorrectly and confidently identified as AI-written. It is advised to only use the classifier for English text as it has performed poorly in other languages and on code.

It also cannot reliably identify highly predictable text, say researchers. For example, a list of the first 1,000 prime numbers would always have the same answer and cannot be differentiated between AI or human-written. 

OpenAI says the classifier is a language model that has been fine-tuned on a dataset consisting of pairs of human-written text and AI-written text that address the same topic. The dataset was gathered from various sources believed to be written by humans, including the pretraining data and human demonstrations on prompts submitted to InstructGPT, say researchers.

The text pairs were divided into prompts and responses, and responses were generated from multiple language models trained by OpenAI and other organisations. In the web app, the confidence threshold has been adjusted to keep the rate of false positives low. This means that text will only be marked as "likely AI-written" if the classifier is very confident in its prediction.


Featured Articles

Generative AI ushering in a bold new future for enterprises

Almost all execs agreed that generative AI technology will spark significant creativity and innovation, ushering in a new era of enterprise intelligence

Rogue data centres may need to be destroyed: AI researcher

Leading artificial intelligence researcher, Eliezer Yudkowsky says data centres may need to be dismantled to prevent AI threat to humanity & all life.

NICE creates humanised AI-driven CX powered by Generative AI

Enlighten Actions is revolutionising the use of data and generative AI with the aim of pinpointing brand-specific actions to drive business growth

Need for responsible AI in some of the world’s largest banks

AI Strategy

Lenovo: Employees prefer mix of AI and human IT support

AI Strategy

Kyndryl’s Data and AI Console to simplify data management

Data & Analytics