The Flaw in Artificial-Intelligence Chatbots: Mistaking Nonsense for Natural Language

The Problem with Large Language Models in Chatbots

A new study published in Nature Machine Intelligence reveals a significant flaw in artificial-intelligence chatbots. Despite their ability to understand and use language like humans, these chatbots, which rely on large language models, often mistake nonsense for natural language. Researchers at Columbia University conducted experiments to understand this flaw better and potentially improve chatbot performance. This flaw also sheds light on how humans process language.

The Study: Testing Language Models and Human Perception

The Columbia University team challenged nine different language models by presenting them with pairs of sentences. Human participants were then asked to choose which sentence seemed more natural. The researchers compared the models’ ratings to those of the human participants to evaluate their performance.

Results: AI Mistakes and the Complexity of Language

The study found that more sophisticated AI models based on transformer neural networks outperformed simpler recurrent neural network models and statistical models. However, all models made mistakes by selecting sentences that made no sense to humans. Despite this, some large language models showed promise, indicating that they capture important aspects of language that simpler models miss.

“That some of the large language models perform as well as they do suggests that they capture something important that the simpler models are missing,” stated Dr. Nikolaus Kriegeskorte, a principal investigator at Columbia’s Zuckerman Institute and coauthor of the study. “That even the best models we studied still can be fooled by nonsense sentences shows that their computations are missing something about the way humans process language.”

The Importance of Understanding Human Language Processing

The study provides an example of a sentence pair that human participants and AI models assessed differently. While humans found the first sentence more natural, BERT, a language model, identified the second sentence as more natural. This discrepancy raises concerns about the reliability of AI systems in making important decisions.

“Every model exhibited blind spots, labeling some sentences as meaningful that human participants thought were gibberish,” said senior author Christopher Baldassano, an assistant professor of psychology at Columbia. “That should give us pause about the extent to which we want AI systems making important decisions, at least for now.”

Furthermore, the study’s results intrigue Dr. Kriegeskorte, as they highlight the imperfection of AI models while performing well. Understanding the disparity and the varying performance levels of different models can lead to further progress in language models.

AI Chatbots and the Study of Human Brains

Researchers also explore whether AI chatbots’ computations can inspire scientific questions and hypotheses that aid in understanding the human brain. Analyzing the strengths and weaknesses of various chatbots and their algorithms could unlock insights into human language processing.

“Ultimately, we are interested in understanding how people think,” said Tal Golan, the corresponding author of the paper. “These AI tools are increasingly powerful, but they process language differently from the way we do. Comparing their language understanding to ours gives us a new approach to thinking about how we think.”

Source link