34 The RAI Toolkit provides developers with a structured project form to “assess risks throughout the implementation of AI projects,” and ensure that tools under development align with DIU and CDAO’s best practices.40 Outside of the DOD, and for Gen-AI specific technologies, many frameworks exist to ensure Gen-AI models perform to the accuracy and reliability needed for responsible implementation. While CDAO also has existing documentation outlining their specific approach to testing and evaluating models,41 additional methodologies offer established practices to integrate into any project. One prominent example is Human-Calibrated Automated Testing and Validation of Generative Language Models (HCAT), a white paper that offers structured approaches to evaluating Gen-AI models. The paper covers many similar approaches to those that CDAO already integrates, related to assessing model performance, such as measuring robustness; however, HCAT’s focus on “a calibration process that aligns machine evaluations” offers an example of evaluating AI in a risk-prone industry that specifically addresses the issue of human-machine teaming. The paper describes “Calibration with Human Judgements” in which samples of both human and machine evaluations are compared using regression techniques.42 Consistently comparing data to a ‘gold standard’ data set that typically involves human judgement, with machine performance in high-stakes tasks is critical because it allows models to improve accuracy when collaborating with human operators. In fact, human- in-the-loop is a common factor across most frameworks because it is essential to ensuring both accountability and contextual accuracy. Building on this critical factor, CDAO further outlines that “DOD personnel are accountable for outcomes and decisions made with Gen-AI’s assistance.” 43 Execution The final stage in a Gen-AI implementation is to deliver the tool to the enterprise. Delivery is primarily the responsibility of the vendors and thus approaches will vary, but enterprises still need to manage the assimilation of the tools into the workforce. This includes developing effective change- management principles to socialize the tools and then train the workforce on using them. We discuss each below. Change Management Integrating Gen-AI into any enterprise will require the workforce to retool. Enterprise inertia, including challenges with negative perceptions of the technology, and aversion to change will need to be addressed in ways that encourage adoption organically rather than mandate it. We see two approaches as critical to this process, including early socialization, and stakeholder integration during development. For socialization, communication with stakeholders across the organization chart is critical in the early stages of any integration. Developing the tools in a black box without stakeholder involvement throughout the process risks significantly under-serving the user base in the long run. We recommend addressing this by incorporating end-users into tool development through on-premises working sessions, fine-tuning, formal discussions, and direct co-development when possible. These practices ensure that the tools are developed alongside end-users, building familiarity and buy- in. The format of communication is equally important as the method; avoiding ‘jargon’ and plainly describing technical concepts will make the tools more relatable and enable faster understanding of how to apply them. Further, consistent demonstrations of the product and informal discussions with stakeholders can build individualized understanding of the tools before rolling them out. These practices collectively increase familiarity with the tools before formal training and integration begins thereby investing the workforce in the outcome. Generative AI Adoption in the US Military
Generative AI Adoption in the US Military Page 33 Page 35