Respondents were asked to express purchase intent using the standard “five-box” options: definitely would not purchase, probably would not purchase, might or might not purchase, probably would purchase, and definitely would purchase. Jameson and Bass (1989) recommend weighting responses for the five possible responses as 0, 0.25, 0.50, 0.75, and 1.00 to develop a single measure of purchase probability, which we use as a measure of idea quality. Of course, many other weightings are possible. We report results using the Jameson and Bass weights, but the results are robust to other convex weighting schemes. Results The full quality distribution of ideas generated by the three pools is shown in Figure 1. Figure 1 - Distribution of idea quality for three sets of ideas. Purchase intent is the weighted average of the five-box response scale per Jameson and Bass (1989). The average quality of ideas generated by ChatGPT is higher than the average quality of ideas generated by humans, as measured by purchase intent. The average purchase probability of a human-generated idea is 40.4%, that of vanilla GPT-4 is 46.8%, and that of GPT-4 seeded with good ideas is 49.3%. The difference in
Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation Page 6 Page 8