Reimagining Customer Service Journeys with LLMs: A Framework for Chatbot Design and Workflow Integration (3/14)

medication, a more deterministic approach is likely to be preferred. To guide chatbot behavior, the most useful tool is the “system prompt,” which specifies the rules and persona of the bot and what it can and cannot do. A teaching assistant bot, for example, should perhaps only guide a student through a problem, but not reveal solutions too quickly. Customizing a bot’s abilities via the system prompt is straightforward and can be done entirely using free-text instructions. While results are often impressive, extensive testing is necessary to evaluate instruction and adherence and whether the model performs well enough without any special knowledge. If more specialized knowledge is needed, implementing a retrieval augmented generation (RAG) approach can help integrate specific data sources, like product information or operating policies, into the bot's knowledge base. RAG works by taking a set of documents, breaking them down into smaller pieces (chunks), and storing them in a database. When processing a user question, the query is compared to all the knowledge pieces to determine the most relevant ones to help answer the question. When searching for related documents, ideally the bot only considers highly relevant information, reducing the amount of text it has to process which improves response time and can help with accuracy as information is less likely to be overlooked. For example, a customer support chatbot for a retailer should probably be aware of return policies (“Is there a longer return window for purchases during the holiday season?”). A “plain vanilla” GPT model is of little value in such cases just as in the Air Canada example. However, overloading the system prompt with this type of information can degrade response performance and is often impractical due to context window limitations. As such, allowing the bot to refer to external information can vastly improve its knowledge base. RAG is especially useful when external information changes rapidly, as the data storage can be quickly updated independently of the LLM. Depending on the data sources needed for RAG, e.g. customer records in Salesforce, additional engineering work is necessary to translate LLM requests to meaningful lookups in the respective databases. Many intricate details can greatly impact RAG performance. The original documents need to be split into parts that are not too big (and hence too general) and not too small to lack sufficient detail.i Dimension 2: Isolated Interactions vs. Long-time Relationship Does the bot remember users over multiple episodes? Most bots start an interaction with a user with a “clean slate” (“Hi, I am a virtual assistant; what can I do for you today?”). Each interaction stands alone and there is no memory from one interaction to the other. This works well for a myriad of inquiries, such as asking about historic events or gift ideas. In contrast, a relationship-focused bot can offer personalized assistance based on past interactions (“Great to see you today! I know you were wondering about opening an account last week. We just increased our interest rate for new customers. Do you want to learn more?”).

Reimagining Customer Service Journeys with LLMs: A Framework for Chatbot Design and Workflow Integration - Page 3

Reimagining Customer Service Journeys with LLMs: A Framework for Chatbot Design and Workflow Integration Page 2 Page 4