8 By comparing these 35 idea pools created with the help of GPT-4 we set up a competition which prompting strategy is most effective in increasing idea diversity. The human group generated ideas serve as a baseline in this competition. Note that the human group idea pool is an aggregation of ideas submitted by separate individuals rather than being a list of 100 ideas generated by a single group. This aggregation of independently generated ideas gives an edge to the human innovators and sets up a high bar. Methodology In this section, we define our key outcome metrics, which are Cosine Similarity, Number of unique ideas, and Speed of exhaustion. We also describe our technical set-up. Outcome Measures We have three main outcome measures: cosine similarity, number of unique ideas and speed of exhaustion. They are discussed in detail below. Cosine Similarity Cosine similarity is a measure of similarity between two ideas (or other forms of text). Since LLMs translate text into embeddings (vectors), it is possible to mathematically measure the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is particularly useful in comparing texts because it is less sensitive to the overall size of the documents; it focuses more on their orientation in the vector space. Just like in Geometry, the cosine of the angle between the vectors is calculated, which ranges from -1 to 1. A cosine similarity of 1 implies that the two ideas are very similar, having a cosine angle of 0 degrees implies that they are orthogonal to each other. While cosine similarity is an accepted measure for comparing text similarity it is not without problems. For instance, changing the embeddings model could yield dramatically different results depending on the training even if it was optimized for the same purpose. In addition, the cosine similarity might not capture all the dimensions of idea similarity that a human might consider. The alternative to using cosine similarity is to measure idea diversity by relying on human raters. However, rating idea similarity is not only a very subjective task, it also does not scale well, i.e.,
Prompting Diverse Ideas: Increasing AI Idea Variance Page 7 Page 9