Large language models like GPT-3/4, PaLM, and LaMDA have shown general-purpose features and emergent skills in tasks like language translation and arithmetic operations. However, their training objective does not directly encode these objectives. To understand factors that promote these abilities, researchers from UW Madison trained tiny transformer models to learn fundamental mathematical operations. They found that data format and sample size are important, with reversed outcomes and balanced variations of addition improving learning. They also explored the advantages of chain-of-thought data, which enables step-by-step learning. They discovered that text and numeric data interact during training, with prior knowledge enhancing performance. Model size and pretraining also play a role, with pretraining enabling acceptable performance on arithmetic tasks. Models showed limitations in generalizing beyond trained digit lengths, indicating a mapping function rather than a comprehensive understanding of mathematics. The research draws on earlier work and provides in-depth investigations on various factors. Proper data formatting greatly improves performance and sample effectiveness. For more details, refer to the linked paper and GitHub. Join the ML subreddit, Discord channel, and email newsletter for more AI news and projects. For inquiries, email Asif@marktechpost.com. Aneesh Tickoo, a consulting intern at MarktechPost, is pursuing a degree in Data Science and AI from IIT Bhilai and is passionate about machine learning and image processing.