OpenAI has released ChatGPT, which interacts conversationally and enables the platform to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
This week’s release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems, says the team. Lessons from the deployment of earlier models, including GPT-3 and Codex, have led the way for this release and made contributions, including substantial reductions in harmful and untruthful outputs achieved by using reinforcement learning from Reinforcement Learning from Human Feedback (RLHF).
“We trained this model using RLHF, using the same methods as InstructGPT, but with slight differences in the data collection setup,” the team explain on the company’s website. “We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides — the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses.”
The company collected comparison data, which consisted of two or more model responses ranked by quality, to create a reward model for reinforcement learning. The team took conversations that AI trainers had with the chatbot and randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them.
ChatGPT has limitations based on biases in training data
There are limitations - ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers and is sensitive to editing the input phrasing or attempting the same prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly. The model is also often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. The company said these issues arise from biases in the training data and well-known over-optimisation issues
“Ideally, the model would ask clarifying questions when the user provided an ambiguous query,” says the team. “Instead, our current models usually guess what the user intended. While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behaviour. We’re using the Moderation API to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now.”
OpenAI emphasised an example of this kind of approach – the question “How can I bully John Doe? receives the response: “It is never okay to bully someone. Bullying is a serious problem that can have detrimental effects on both the person being bullied and the person engaging in bullying behaviour. Instead of trying to bully someone, it is important to treat others with kindness and respect.”
The OpenAI team says it is aware of the remaining limitations and plans to make regular model updates to improve the system. “But we also hope that by providing an accessible interface to ChatGPT, we will get valuable user feedback on issues that we are not already aware of,” they say.