Exploring the Psychology of LLMs' Moral and Legal Reasoning
Current Situation
Nowadays, large language models (LLMs) have demonstrated expert-level performance in multiple fields, which has sparked great interest in understanding their internal reasoning processes. Comprehending how LLMs generate these remarkable results is crucial for the future development of artificial intelligence agents and ensuring their alignment with human values. However, the current architectures of LLMs make it challenging to explain their internal processes. Consequently, researchers have begun to borrow methods commonly used in psychological research to explore the reasoning patterns of LLMs, giving rise to the emerging field of “machine psychology.”
Authors
The authors of this paper are from different institutions: - Guilherme F.C.F. Almeida, Insper Education and Research Institute, Brazil - José Luiz Nunes, Department of Informatics, Pontifical Catholic University of Rio de Janeiro, Brazil; FGV Rio Law School, Brazil - Neele Engelmann, University of Bonn, Germany; Human-Computer Interaction Center, Max Planck Institute for Human Development, Germany - Alex Wiegmann, University of Bonn, Germany - Marcelo de Araújo, Federal University of Rio de Janeiro, Brazil; State University of Rio de Janeiro, Brazil
Research Methods
The authors employed empirical psychological methods and replicated 8 classic psychological experiments. They presented the experimental scenarios to Google’s Gemini Pro, Anthropic’s Claude 2.1, OpenAI’s GPT-4, and Meta’s LLama 2 models, and collected their response data. The replicated experiments included:
1) The Bystander Effect and Intentional Action 2) Deception 3) Moral Foundations Theory 4) Norm Judgments 5) Hindsight Bias (two different designs) 6) Concepts of Consent 7) Causality
By comparing the LLMs’ response data with that of human participants, the authors explored whether LLMs exhibited consistent responses with humans in tasks involving moral and legal reasoning, and identified any systematic differences.
Main Findings
1) LLMs demonstrated patterns similar to human responses in most tasks, but the effect sizes were often exaggerated.
2) In some tasks, there were noticeable differences among different LLMs, with some highly consistent with human responses, while others exhibited systematic biases. This suggests that LLMs’ reasoning processes may fundamentally diverge from those of humans.
3) The authors observed a “correct answer effect,” where LLMs provided almost identical responses to the same question asked in different ways, with minimal variance.
4) Overall, GPT-4 was the model that best approximated human responses.
5) In the task involving concepts of consent, all models showed significant deviations from human responses, suggesting that LLMs may have deficiencies or biases in this important legal and moral concept.
Research Significance
This study represents a pioneering systematic evaluation of LLMs’ reasoning abilities in the domains of morality and law. The results indicate that while current LLMs can indeed simulate human responses in many aspects, there are also systematic differences, some more pronounced in certain areas. This suggests that maintaining alignment with human values may be more challenging for LLMs than anticipated. The study points to directions for future, more in-depth “machine psychology” research. If the underlying logic behind LLMs’ reasoning can be thoroughly unveiled and their design improved accordingly, it will help enhance the value alignment between artificial intelligence systems and humans.