Last month, OpenAI launched its newest AI chatbot product, GPT-4. This chatbot, created using machine learning, has proven to be quite impressive. It scored in the 90th percentile on the bar exam, passed 13 out of 15 AP exams, and nearly aced the GRE Verbal test.
A group of researchers from BYU and 186 other universities decided to put OpenAI’s technology to the test specifically in the field of accounting. While the original version, ChatGPT, still has some room for improvement in accounting, it has the potential to revolutionize the way teaching and learning take place.
David Wood, a BYU professor of accounting and lead study author, said, “When this technology first came out, everyone was worried that students could now use it to cheat. But cheating opportunities already exist. So, instead, we’re focusing on how this technology can enhance the teaching and learning processes. Experimenting with it has been eye-opening.”
Since its launch in November 2022, ChatGPT has become the fastest-growing technology platform ever, gaining 100 million users in less than two months. Due to the ongoing debate about integrating models like ChatGPT into education, Wood decided to recruit as many professors as possible to test the AI against actual university accounting students.
His social media recruiting pitch went viral, resulting in 327 co-authors from 186 educational institutions across 14 countries joining the research. They contributed 25,181 accounting exam questions from their classrooms. Additionally, undergraduates from BYU, including Wood’s daughter Jessica, provided another 2,268 questions from the textbook’s test bank. The questions covered various accounting topics and types (true/false, multiple choice, short answer, etc.), ranging in difficulty.
ChatGPT performed impressively, but the students outperformed it. The students obtained an average score of 76.7%, compared to ChatGPT’s score of 47.4%. ChatGPT did excel in accounting information systems (AIS) and auditing, but struggled with tax, financial accounting, and managerial accounting, potentially due to difficulties with the required mathematical processes.
When it came to question types, ChatGPT performed better with true/false questions (68.7% correct) and multiple-choice questions (59.5%), but struggled with short-answer questions (28.7% to 39.1% correct). Higher-order questions proved to be more challenging for ChatGPT, sometimes leading to incorrect answers or inconsistent responses.
Jessica Wood, a freshman at BYU, remarked, “It’s not perfect; you’re not going to be using it for everything. Trying to learn solely by using ChatGPT is a fool’s errand.”
During the study, the researchers discovered other interesting patterns, including:
- ChatGPT occasionally fails to recognize math problems and provides nonsensical answers, such as adding numbers in a subtraction problem or dividing numbers incorrectly.
- ChatGPT often offers explanations for its answers, even when they are incorrect. In some cases, it accurately describes the answer but selects the wrong multiple-choice option.
- ChatGPT sometimes generates fictitious facts. For example, it creates a seemingly legitimate reference that does not actually exist.
Despite these issues, the authors anticipate that GPT-4 will significantly improve on accounting questions and address the aforementioned problems. The most promising aspect is how the chatbot can enhance teaching and learning, such as drafting assignments or testing projects.
Melissa Larson, a coauthor and fellow BYU accounting professor, reflected, “This is a disruption, and we need to evaluate where we go from here. It forces us to reconsider whether we are teaching valuable information or not. Of course, I’ll still have teaching assistants, but this will redefine how we utilize them.”