13 countermeasures such as user authentication could reduce this risk even further. We also believe that the classroom setting in general, paired with students’ eagerness to learn through an immersive simulation, makes it less likely for them to attempt to trick the system. As noted above, and most visibly in Figure 2, the correlation between LLM-assisted assessment and instructor assessment are generally high. No human could ever possibly assess hundreds of students in such a short amount of time with such high accuracy while also providing customized feedback to each student – let alone for such a minimal cost. It must also be noted that the model might make student assessment fairer overall. Most of us like to believe we are consistent in how we assess student performance, but we have found throughout the years that our ratings for the individual dimensions vary over time. A benevolent interpretation might be that our thinking advances and we hence adjust our scoring. It is, however, equally possible that it is incredibly difficult to consistently distill the strengths and weaknesses of the text that students generate into a number from 0-100. It is even harder to do that without having a reference point. Large language models also offer the advantage of (generally) providing the same score for the same vision. However, it is possible that students with specific writing styles or non-native speakers are rated differently due to biases in the model that are not immediately apparent. That being said, it is equally possible that human raters will show similar biases in their ratings. Overall, leveraging large language models can improve educational simulations by providing students with real-time feedback to their natural patterns of speech and decision making. We believe this is the logical next step for any new or continued simulation development. An important transformation of education is already underway, and AI-supported simulations can be an important catalyst as we enter this new era in student learning.
Beyond Multiple Choice: The Role of Large Language Models in Educational Simulations Page 12 Page 14