How Language Models are Learning Annotation Tasks in Context
Language models (LMs) are being utilized to perform basic data analysis on drug and medical histories. However, creating a large labeled dataset for training machine-learning models is a challenge that requires costly manual labeling with domain experts. To address this issue, researchers from Stanford University, Anthropic, and the University of Wisconsin-Madison have developed language models that learn annotation tasks in context and eliminate the need for manual labeling on a large scale.
LMs with in-context capabilities can remember tasks from prompt descriptions, allowing them to modify prompt predictions rather than the prompt itself. This is important because even small changes in the language of a prompt can impact the accuracy of predictions. The researchers propose a method called “Embroid,” which computes multiple representations of a dataset using different embedding functions. Embroid then uses the consistency between LM predictions to identify mispredictions and generates additional predictions for each sample. These predictions are combined with a simple variable graphical model to determine the final corrected prediction.
One important question is how the performance of Embroid changes with variations in dataset size. The researchers found that Embroid relies on nearest neighbors in different embedding spaces, so performance may be poor with small annotated datasets. They also compared the performance variation when the domain specificity of the embedding changed and when the quality of the embedding space changed. In both cases, Embroid outperformed traditional language models.
Embroid also utilizes statistical techniques developed with weak supervision. Its objective in weak supervision is to generate probabilistic labels for unlabeled data by combining the predictions of multiple sources of noise. By using embeddings to construct synthetic predictions, Embroid improves upon the original predictions.
The researchers compared Embroid with six other language models for various tasks and found that Embroid improved performance by an average of 7.3 points per task on GPT-JT and 4.9 points per task on GPT-3.5.
To learn more about Embroid, you can read the paper and blog linked below. This research is credited to the researchers involved in the project. Don’t forget to join our ML SubReddit, Facebook Community, Discord Channel, and Email Newsletter for the latest AI research news and projects.