Study finds ChatGPT eases students’ cognitive load, but at the expense of critical thinking

A study conducted with German university students found that while using ChatGPT to search for information on a scientific topic made their work easier, it also led to more superficial results. Students using ChatGPT reported lower mental effort, or cognitive load, compared to those using Google search. However, their reasoning and the quality of their arguments were less thorough. These findings, published in Computers in Human Behavior, shed light on the potential trade-offs between convenience and depth when relying on artificial intelligence for research tasks.

In recent decades, the internet has dramatically changed how we access information, making vast amounts of data easily available at our fingertips. This information comes in many forms—from news articles to blogs, academic papers, and personal opinions—and from countless sources, both credible and questionable. While this abundance is a tremendous resource, it also presents challenges. The ease of access means we are increasingly responsible for evaluating the quality of the information we find, determining what is relevant, and distinguishing reliable sources from those that are inaccurate or misleading.

Recently, large language models (LLMs) like ChatGPT have added a new layer to how people gather information. ChatGPT is an artificial intelligence chatbot designed to respond to user queries in natural, human-like language across a wide range of topics. Unlike traditional search engines, which direct users to websites that might contain relevant information, ChatGPT aims to answer questions directly, drawing on its vast database and language processing capabilities.

Study author Matthias Stadler and his colleagues wanted to explore how using ChatGPT compares to traditional web searches when it comes to cognitive load and the quality of arguments that students produce. Cognitive load refers to the mental effort required to process information. They hypothesized that students using ChatGPT would experience lower cognitive load than those using Google, but that this reduction in effort might come at the cost of producing less rigorous conclusions.

The researchers also expected that the recommendations produced by students using ChatGPT would be more uniform compared to those using traditional web search, as ChatGPT tends to provide structured, direct responses rather than exposing users to diverse sources of information.

The study involved 91 university students, with an average age of 22 years, most of whom were female. These students were randomly assigned to one of two groups: one group used ChatGPT to complete their research task, while the other group used the Google search engine.

The research task was designed to reflect a real-world decision-making scenario. Students were asked to help a fictional friend named Paul decide whether he should use sunscreen containing mineral nanoparticles, such as zinc oxide and titanium dioxide. Paul expressed concerns about the potential health risks associated with these nanoparticles, though he was aware of their benefits, such as offering high sun protection and not using chemical agents that could cause allergic reactions. The students had 20 minutes to research the topic and then provide a written recommendation to Paul, complete with justifications for their conclusions.

In addition to completing the research task, students filled out a questionnaire measuring their cognitive load during the exercise. The questionnaire assessed three types of cognitive load: extraneous (how much mental effort was spent filtering out unnecessary information), intrinsic (the inherent complexity of the material), and germane (the mental effort devoted to understanding and processing the material). Students’ prior knowledge of nanotechnology was also measured using a standardized test.

The researchers evaluated the students’ recommendations for Paul, looking at both the number and quality of arguments used to justify their conclusions. They were particularly interested in whether the students considered both the benefits and potential risks of nanoparticles in sunscreen.

The results confirmed the researchers’ expectations in several areas. Students using ChatGPT experienced significantly lower cognitive load compared to those using Google. This indicates that ChatGPT made the research process easier, likely because it provides direct answers rather than requiring students to sift through and evaluate multiple sources of information. In particular, ChatGPT’s ability to summarize complex topics in plain language seemed to reduce the amount of mental effort needed to navigate through the task.

However, this ease of use came at a cost. The quality of the justifications provided by the ChatGPT group was lower than that of the students who used Google. Those in the traditional search group produced more detailed arguments and cited more relevant pieces of information in their recommendations. This suggests that while ChatGPT can simplify the process of finding answers, it might not encourage the same depth of engagement that comes with searching through a range of diverse sources and critically evaluating them.

One surprising finding was that students using ChatGPT did not show less variation in their final recommendations. The researchers had predicted that because ChatGPT provides more structured and directed responses, students would produce more uniform conclusions. However, this was not the case—students in both the ChatGPT and Google groups offered a wide range of recommendations, reflecting a variety of perspectives on whether Paul should use nanoparticles in his sunscreen.

“While LLMs like ChatGPT offer an efficient way to reduce intrinsic and extraneous cognitive load, they may not always facilitate the deep learning necessary for complex decision-making tasks. Traditional search engines, by necessitating more active engagement, may promote a higher quality of learning, underscoring the need for educational practices that encourage critical engagement with diverse information sources.” study authors concluded.

The study provides an intriguing insight into the difference in cognitive processes when using large language models and when using traditional web search. However, the study involved a relatively small number of participants, they were all students and studied their behavior on a very short task. Results might not be the same on other demographic groups and if the task was more complex.

Another limitation is that the study focused on a relatively short, 20-minute research task. It’s possible that in longer or more complex tasks, the differences between using ChatGPT and traditional search engines might become even more pronounced. Future research should explore how these tools perform in more extended learning situations and with tasks that require a deeper understanding of the material.

The paper, “Cognitive Ease at a Cost: LLMs Reduce Mental Effort but Compromise Depth in Student Scientific Inquiry,” was authored by Matthias Stadler, Maria Bannert, and Michael Sailer.