Introducing Persimmon-8B: Revolutionizing Language Models with Enhanced Context and Performance

AI News

Introducing Persimmon-8B: Revolutionizing Language Models with Enhanced Context and Performance

Jimmy W.

September 10, 2023

Introducing Persimmon-8B: Revolutionizing Language Models with Enhanced Context and Performance

Persimmon-8B: The New Language Model Making Waves in AI

Artificial intelligence has made tremendous strides in recent years, especially in the development of language models. Now, Adept AI Labs is introducing Persimmon-8B, an open-source language model with vast potential for a range of computer-related tasks. However, it’s important to note that the model may produce outputs with potential toxicity, so careful evaluation is necessary.

The Power of Persimmon-8B

Persimmon-8B is a game-changer in the field of language models. With a context size four times that of LLaMA2 and eight times that of GPT-3, it excels in handling context-bound tasks. Despite being trained on less data, Persimmon-8B performs just as well as other models of its size, if not better.

To evaluate Persimmon-8B, the Adept team takes a unique approach. Instead of relying solely on probabilities, they directly interact with the model by asking questions and gauging its responses. This allows for a more accurate assessment of its capabilities.

Persimmon-8B Outshines Competitors

Comparing Persimmon-8B to other models in its size range, such as LLama 2 and MPT 7B Instruct, Persimmon-8B-FT emerges as the top performer across various metrics. Even the base model, Persimmon-8B-Base, performs on par with LLama 2 despite being trained on less data. This highlights the model’s efficiency and effectiveness.

Technical Features of Persimmon-8B

Persimmon-8B is a decoder-only transformer with several architectural enhancements, including squared ReLU activation and rotary positional encodings. These optimizations make it more efficient than conventional alternatives. The model has approximately 9.3 billion parameters, which have been optimized for efficient training. Additionally, the decoupling of input and output embeddings streamlines the training process.

Fast and Efficient Inference

Persimmon-8B delivers impressive performance in terms of inference speed. With optimized code, it can generate around 56 tokens per second on a single 80GB A100 GPU. This makes it ideal for real-time applications.

Unlocking the Future of Language Models

Persimmon-8B represents a significant milestone in the field of language models. Its capabilities and innovative evaluation approach open up new possibilities for interactive AI applications. By open-sourcing the model, Adept AI Labs invites the community to build upon its foundation and drive further innovation. As the model gains traction, it is poised to revolutionize human-computer interactions across various domains.

Don’t forget to check out the Adept Blog and the GitHub link for more information on Persimmon-8B. Credit goes to the researchers involved in this project. Join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter to stay updated on the latest AI research news and projects.

If you enjoy our work, you’ll love our newsletter. Sign up now!

Source link