Detecting Problematic Text in Large Language Models: The MIN-K% PROB Method

AI News

Detecting Problematic Text in Large Language Models: The MIN-K% PROB Method

Jimmy W.

October 30, 2023

Detecting Problematic Text in Large Language Models: The MIN-K% PROB Method

Large Language Models (LLMs): Detecting Problematic Training Text

Large Language Models (LLMs) are powerful models that can process large amounts of text data. These models are trained on a massive corpus of texts, ranging from a few hundred gigabytes to even terabytes. However, due to the scale of the training data, it is important to determine if the data contains problematic texts, such as copyrighted material or personally identifiable information. Additionally, the developers of LLMs are now less willing to disclose the full composition of their data.

Introducing WIKIMIA: A Dynamic Benchmark

A group of researchers from the University of Washington and Princeton University have addressed this issue by introducing a benchmark called WIKIMIA. This benchmark includes both pretraining and non-pretraining data to support gold truth. The researchers have also developed a new detection method called MIN-K% PROB, which identifies outlier words with low probabilities under the LLM.

Having a reliable benchmark is crucial in identifying problematic training text. WIKIMIA is a dynamic benchmark that automatically evaluates the detection methods on newly released pretrained LLMs. The MIN-K% PROB method is based on the hypothesis that unseen text is more likely to contain words that the LLM doesn’t know well. MIN-K% PROB calculates the average probability of these outlier words.

How MIN-K% PROB Works

The MIN-K% PROB method determines whether an LLM was trained on a given text. It uses the LLM to calculate the probabilities of each token in the text. It then selects the k% of tokens with the minimum probabilities and calculates their average log-likelihood. A higher value indicates that the text is likely to be in the pretraining data.

Real-Life Scenarios and Findings

The researchers applied the MIN-K% PROB method to three real-life scenarios. In the first scenario, they detected copyrighted books by analyzing a test set of 10,000 text snippets from 100 copyrighted books. They found that approximately 90% of the snippets had a contamination rate of over 50%. Specifically, the GPT-3 model contained text from 20 copyrighted books.

In the second scenario, the researchers used the MIN-K% PROB method to remove personal information and copyrighted data from LLMs. They discovered that LLMs could still generate similar copyrighted content even after unlearning copyrighted books.

Conclusion: A Step Towards Transparency and Accountability

The MIN-K% PROB method provides a new approach to detecting problematic training text in LLMs. The researchers validated the effectiveness of their methods through real-world case studies. They found strong evidence that the GPT-3 model may have been trained on copyrighted books. This method offers a consistent and effective solution in identifying problematic training text, contributing to better model transparency and accountability.

If you’re interested in learning more about this research, you can read the paper and check out the Github and project. All credit goes to the researchers involved in this project. Don’t forget to join our community on Reddit, Facebook, Discord, and subscribe to our email newsletter to stay updated on the latest AI research news and projects.

If you enjoy our work, you’ll love our newsletter. Subscribe now!

You can also find us on Telegram and WhatsApp.

Source link

Large Language Models (LLMs): Detecting Problematic Training Text

Introducing WIKIMIA: A Dynamic Benchmark

How MIN-K% PROB Works

Real-Life Scenarios and Findings

Conclusion: A Step Towards Transparency and Accountability

LEAVE A REPLY Cancel reply