9 Figure 2: Different NLP approaches’ correlation with professorial ratings for vision statement task. Implementation Details For educators interested in integrating GPT-based text assessment into simulations, we will provide a few observations and technical details here. Our simulation is developed in Unity and we use the Com.Openai.Unity package (“RageAgainstThePixel/Com.Openai.Unity” 2024) to make API calls to OpenAI. A small proxy forwards the front-end requests coming from our Unity WebGL app to prevent the API keys from being visible. The student input is appended to the prompt to serve as the “vision input” to grade. For all requests, the temperature6 was set to 0 to ensure that we get the most consistent responses. However, we would still 6 The temperature affects the randomness of the token selection during text generation. A temperature of 0 greatly restricts the model's output, ideally limiting it to the most likely token at each step, which generally results in very deterministic and repetitive outputs. However, even with a temperature of 0, minor variations in output can still occur due to other factors in the model's architecture and implementation.
Beyond Multiple Choice: The Role of Large Language Models in Educational Simulations Page 8 Page 10