Large language models excel at creating and solving emotional intelligence tests, study finds

Image illustrating the kind of scenarios used in emotional intelligence tests, along with brief explanations that evaluate the emotional reasoning behind each response. Credit: Katja Schlegel.

Throughout the course of their lives, humans can establish meaningful social connections with others, empathizing with them and sharing their experiences. People’s ability to manage, perceive and understand the emotions experienced by both themselves and others is broadly referred to as emotional intelligence (EI).

Over the past decades, psychologists have developed various tests designed to measure EI, which typically assess people’s ability to solve emotion-related problems that they may encounter in their everyday lives. These tests can be incorporated into various psychological assessments employed in research, clinical, professional and educational settings.

Researchers at the University of Bern and the University of Geneva recently carried out a study assessing the ability of large language models (LLMs), the machine learning techniques underpinning the functionality of conversational agents like ChatGPT, to solve and create EI tests. Their findings, published in Communications Psychologysuggest that LLMs can solve these tests almost as well as humans and could be promising tools for developing future psycho-metric EI tests.

“I’ve been researching EI for many years and developed several performance-based tests to measure people’s ability to accurately recognize, understand, and regulate emotions in themselves and others,” Katja Schlegel, first author of the paper, told Medical Xpress.

“When ChatGPT and other large language models became widely available and many of my colleagues and I began testing them in our work, it felt natural to ask: how would these models perform on the very EI tests we had created for humans? At the same time, a lively scientific debate is unfolding around whether AI can truly possess empathy—the capacity to understand, share, and respond to others’ emotions.”

EI and empathy are two closely linked concepts, as they are both associated with the ability to understand the emotional experiences of others. Schlegel and her colleagues Nils R. Sommer and Marcello Mortillaro set out to explore the extent to which LLMs could solve and create emotion-related problems in EI tests, as this could also offer some indication of the level of empathy they possess.

To achieve this, they first asked six widely used LLMs to complete five EI tests that were originally designed for humans as part of psychological evaluations. The models they tested included ChatGPT-4, CHatGPT-o1, Gemini 1.5 flash, Copilot 365, Claude 3.5, Haiku and DeepSeek V3.

“The EI tests we used present short emotional scenarios and ask for the most emotionally intelligent response, such as identifying what someone is likely feeling or how best to manage an emotional situation,” explained Schlegel. “We then compared the models’ scores to human averages from previous studies.”

Using large language models to create and solve emotional intelligence tests — Image showing the percentage of correct responses across the five EI tests for each of the tested LLMs. Credit: Katja Schlegel.

In the second part of their experiment, the researchers asked ChatGPT-4, one of the most recent versions of ChatGPT released to the public, to create entirely new versions of the EI tests used in their experiments. These tests should include different emotional scenarios, questions and answer options while also specifying what the correct responses to the questions are.

“We then gave both the original and AI-generated tests to over 460 human participants to see how both versions compared in terms of difficulty, clarity, realism, and how well they correlated with other EI tests and a measure of traditional cognitive intelligence,” said Schlegel.

“This allowed us to test not just whether LLMs can solve EI tests, but whether they can reason about emotions deeply enough to build valid tests themselves, which we believe is an important step toward applying such reasoning in more open-ended, real-world settings.”

Notably, Schlegel and her colleagues found that the LLMs they tested performed very well on all EI tests, achieving an average accuracy of 81%, which is higher than the average accuracy achieved by human respondents (56%). Their results suggest that existing LLMs are already much better at understanding what people might feel in different contexts, at least when it comes to structured situations like those outlined in EI tests.

“Even more impressively, ChatGPT-4 was able to generate entirely new EI test items that were rated by human participants as similarly clear and realistic as the original items and showed comparable psychometric quality,” said Schlegel. “In our view, the ability to both solve and construct such tests reflects a high level of conceptual understanding of emotions.”

The results of this recent study could encourage psychologists to use LLMs to develop EI tests and training materials, which are currently done manually and can be fairly time consuming. In addition, they could inspire the use of LLMs for generating tailored role-play scenarios and other content for training social workers.

“Our findings are also relevant for the development of social agents such as mental health chatbots, educational tutors, and customer service avatars, which often operate in emotionally sensitive contexts where understanding human emotions is essential,” added Schlegel.

“Our results suggest that LLMs, at the very least, can emulate the emotional reasoning skills that serve as a prerequisite for such interactions. In our next studies, we plan to test how well LLMs perform in less structured, real-life emotional conversations beyond the controlled format of test items. We also want to explore how culturally sensitive their emotional reasoning is since current models are primarily trained on Western-centric data.”

More information:
Katja Schlegel et al, Large language models are proficient in solving and creating emotional intelligence tests, Communications Psychology (2025). DOI: 10.1038/s44271-025-00258-x.

Citation:
Large language models excel at creating and solving emotional intelligence tests, study finds (2025, June 4)
retrieved 8 June 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Sumber

Large language models excel at creating and solving emotional intelligence tests, study finds

Comments

Leave a Reply Cancel reply

More posts

Bedtime cuddling promotes relationship security and reduces stress, study finds

BowFlex adjustable dumbbells recalled after more than 100 dislodging injuries

As Fluoride Bans Spread, Who Will Be Hit the Hardest?

5 under-resourced health centers tackle AI’s challenges together