Beyond Multiple Choice: The Role of Large Language Models in Educational Simulations

This case study investigates the use of Large Language Models, such as GPT-3 and GPT-4, in enhancing real-time feedback and improving educational simulations for various student cohorts at Wharton.

Co-Brand Name Beyond Multiple Choice: The Role of Large Language Models in Educational Simulations by Lennart Meincke and Andrew M. Carton1 May 26 2024 Abstract This case study explores the potential of Large Language Models (LLMs), specifically GPT-3 and GPT- 4, to enhance the educational experience through real-time simulation feedback. Utilizing a custom-built educational simulation for multiple classes at Wharton, we compared the real-time feedback generated by LLMs against that provided by a human instructor. In addition, we compared the LLM results against the first iteration of the simulation, which utilized traditional natural language processing (NLP) techniques. The evaluation was conducted across three distinct cohorts at Wharton – undergraduate students, Daytime MBA students, and Executive MBA students – with multiple iterations and improvements. Our results show that LLMs dramatically improved real-time feedback provided to students when compared to traditional NLP methods, at very low cost. In addition, the leap from GPT-3 to GPT-4 is significant, boosting correlations between model and instructor ratings from 0.33 to 0.77. Students commented on how real-time feedback to their open-ended responses was a major improvement over traditional simulations, which typically involve students responding to multiple choice questions or otherwise making decisions according to a fixed set of options. The simulation was the highest rated out of a dozen exercises in a midterm poll of undergraduates taking a core management class, outperforming other well-received exercises, such as Harvard’s Everest Simulation. We discuss the implications of these findings for educational simulations, the associated risks of deploying LLMs, and the student classroom experience. Keywords: educational simulations, educational technology, edtech, teaching, LLM, large-scale language models, AI, artificial intelligence, ChatGPT, GPT-3, GPT-4 Acknowledgments: We are greatly indebted to Diana OuYang for her work on the design of the simulation as well as her input throughout its development. We also would like to thank Chris Callison- Burch for his expertise and guidance in leveraging large language models. 1 The Wharton School, 2000 Steinberg-Dietrich Hall, 3620 Locust Walk, Philadelphia, PA 19104, lennart@seas.upenn.edu, carton@wharton.upenn.edu

Beyond Multiple Choice: The Role of Large Language Models in Educational Simulations - Page 1 Beyond Multiple Choice: The Role of Large Language Models in Educational Simulations Page 2