A team of researchers from Mistral AI has introduced Mixtral 8x7B, a language model based on the Sparse Mixture of Experts (SMoE) model. Mixtral is a powerful decoder model with open weights.
Mixtral’s feedforward block is chosen from eight different parameter groups. It efficiently increases the model’s parameter space while preserving cost and latency control. It is pre-trained to understand multilingual data with a 32k token context size. Mixtral has outperformed Llama 2 70B and GPT-3.5 in various benchmarks.
It has performed well in multilingual understanding, code production, and mathematics. Mixtral can recover data effectively from its context window of 32k tokens, regardless of its length and position in the sequence.
The team conducted a wide range of benchmarks to evaluate Mixtral’s performance. In these tests, Mixtral 8x7B – Instruct, a conversation model optimized for instructions, has outperformed GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B – chat model. Mixtral 8x7B and Mixtral 8x7B – Instruct have been licensed under the Apache 2.0 license.
This study confirms Mixtral’s superior performance compared to Llama models on various benchmarks. By comparing code, math, reading comprehension, common sense thinking, world knowledge, and aggregated findings, Mixtral demonstrated its exceptional capabilities. Visit the Paper and Code for a detailed study of Mixtral’s features.
By Tanya Malhotra, University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.