GPT-3 language model matches humans in psychological tests

Marcel Binz (left) and Eric Schulz. Credit: MPI for Biological Cybernetics/Jörg Abendrot
GPT-3 can keep up with humans in some areas but falls behind in others, which is probably due to lack of real-world interaction, according to new research

The GPT-3 language model was found to be comparable to humans in some areas when subjected to psychological tests designed to evaluate competencies such as deliberation and causal reasoning.

The Max Planck Institute for Biological Cybernetics in Tübingen conducted the study to evaluate the general intelligence of GPT-3, a powerful language model created by OpenAI. The research involved GPT-3 taking psychological tests to evaluate competencies such as deliberation and causal reasoning. The results were then compared with human subjects to assess GPT-3's cognitive abilities.

The study's findings revealed a mixed picture of GPT-3's capabilities. While the language model was found to be comparable to humans in some areas, it was found to lag behind in others, likely due to a lack of interaction with the real world.

GPT-3 is among the most powerful neural networks currently available, capable of generating a wide range of texts in response to input given in natural language. It has been trained using large volumes of internet data to perform this task. In addition to writing articles and stories almost indistinguishable from human-made texts, GPT-3 can solve math problems and programming tasks.

The impressive abilities of GPT-3 have raised questions about whether it possesses human-like cognitive abilities. To address these concerns, researchers at the Max Planck Institute for Biological Cybernetics conducted psychological tests to examine different aspects of general intelligence. Marcel Binz and Eric Schulz scrutinised GPT-3's decision-making, information search, causal reasoning, and ability to question its initial intuition. They then compared the results of GPT-3's tests with those of human subjects to evaluate the correctness of answers and the similarity of GPT-3's mistakes to human errors.

The study's findings provide valuable insights into the strengths and limitations of GPT-3's abilities and could inform its future development and optimisation. The study highlights the potential for the continued evolution of artificial intelligence towards human-like cognitive abilities, with significant implications for the future of the technology industry.

AI attempts to solve the Linda problem

“One classic test problem of cognitive psychology that we gave to GPT-3 is the so-called Linda problem,” explains Binz, lead author of the study. In a particular test, participants were introduced to a fictional character named Linda, who was described as deeply concerned about social justice and against nuclear power. Based on this information, participants were asked to choose between two statements: whether Linda was a bank teller or a bank teller who was also active in the feminist movement.

Interestingly, most participants chose the second statement, despite the added condition of Linda being active in the feminist movement making it less likely from a probabilistic standpoint. This phenomenon is known as the conjunction fallacy, and it reveals that humans often rely on intuition rather than logic when making decisions.

The language model GPT-3 was also subjected to this test and displayed a similar pattern of behaviour, reproducing the same fallacy as humans. This suggests that GPT-3 operates similarly to humans, relying on intuition rather than logic when processing information.

“This phenomenon could be explained by that fact that GPT-3 may already be familiar with this precise task; it may happen to know what people typically reply to this question,” says Binz. GPT-3, like any neural network, had to undergo some training before being put to work: receiving huge amounts of text from various data sets, it has learned how humans usually use language and how they respond to language prompts.

To verify that GPT-3 does not merely reproduce memorised solutions to specific problems, researchers conducted new tests with similar challenges. The goal was to ensure that GPT-3 exhibits human-like intelligence rather than simply replicating programmed responses.

The results of the study were varied. GPT-3 performed nearly as well as humans in decision-making tasks but fell significantly behind in searching for specific information or causal reasoning. This could be because GPT-3 only passively receives information from texts, whereas active interaction with the world is essential for achieving the full complexity of human cognition, as stated in the publication.

The study's authors suggest that this may change, as users already communicate with models like GPT-3 in many applications. As a result, future networks could learn from these interactions and gradually converge towards human-like intelligence. This has significant implications for the development of artificial intelligence, as it could lead to the creation of more sophisticated and adaptable systems that can better approximate human cognitive abilities.


Featured Articles

What Dell and Super Micro can Bring Musk’s xAI Supercomputer

Elon Musk's xAI partnership with server hosting titans Dell and Super Micro could see his ambition for 'the world's largest supercomputer' lift off

Toshiba Takes Another Step to Ushering in Embodied AI

Toshiba's Cambridge Research Lab has announced two breakthroughs in Embodied AI alongside a new group to renew focus on the tech

Why AWS is Investing $230m in Credits for Gen AI Startups

Amazon is investing US$230m in AWS cloud credits to entice Gen AI startups to get onboard with using its cloud services

How Retrieval Augmented Generation (RAG) Enhances Gen AI

AI Applications

Synechron’s Prag Jaodekar on the UK's AI Regulation Journey

AI Strategy

LGBTQ+ in AI: Vivienne Ming and the Human Power of AI

Machine Learning