18 This suggests that indeed the hybrid practice of running multiple strategies and combining their ideas is a good option. The most common ideas can be found in Appendix B. Limitations While we feel that our findings convincingly demonstrate the potential for LLM’s to augment and even automate the process of idea generation, we do want to be careful not to claim more than what is supported by the design of our study and the data it produced. In this section, we will discuss two types of limitations to our research, methodological concerns and limitations to our study’s generalizability. On the methodological side, we have to acknowledge that the design, execution, and analysis of our study can be criticized along a number of dimensions. In particular, we see the following types of limitations: • Cosine similarity is a commonly used measure of idea similarity, yet it is not perfect. One weakness is that it does not consider the similarity to ideas that already exist in the world. It is also not clear how it empirically links to human scored measures of similarity. Future research needs to empirically connect this measure with traditional constructs including pairwise comparisons and idea novelty. • More work also needs to be done to better understand the impact of language, style, and text length on cosine similarity. Two ideas that are identical, yet expressed in different languages or with redundant information in their descriptions should in theory obtain a cosine similarity score of 1, which they do not get in practice. • For computational reasons, we are working with a limited sample size for each prompt. Given the stochastic nature of LLMs, some of our results might be driven by statistical noise. • Diversity can be obtained by sacrificing idea quality. There exist countless ideas that have no real user need or that are obviously infeasible (a pill that makes college students healthy, good looking, and smart). We focus on diversity as an end goal. Future research needs to show that this diversity indeed leads to a better best idea. • Our “team human” was represented as an aggregation of ideas working individually. This method is likely to have created a much more diverse idea pool than any human individual or brainstorming team would have been capable of producing.
Prompting Diverse Ideas: Increasing AI Idea Variance Page 17 Page 19