Second, what is the quality distribution of the ideas generated? We are particularly interested in the extreme values – the quality of the best ideas in the three pools. We measure the quality of the ideas using the standard market research technique of eliciting consumer purchase intent in a survey. Given an estimate of the quality of each idea, we can then compare the distributional characteristics of the quality of the three pools of ideas. Third, given the performance of ChatGPT-4 in generating new product ideas, how can LLMs be used effectively in practice and what are the implications for the management of innovation? Approach We have over 20 years of experience teaching product design and innovation courses at Wharton, Cornell Tech, and INSEAD. We have used similar innovation challenges dozens of times with thousands of students. Most of our courses embody the innovation tournament format (Terwiesch and Ulrich 2009, 2023), in which individuals first independently generate many ideas, which are then combined into a pool of several hundred ideas and subsequently evaluated by others in the group (i.e., “crowdsourced” evaluations). Thus, we have access to a large set of ideas generated by humans before AI tools became available to enhance ideation. We randomly selected 200 ideas from the pool of ideas generated in our class in 2021 (i.e., at a time prior to the widespread availability of ChatGPT and other LLMs). These ideas comprise a descriptive title and a paragraph of text. They were all generated in response to the challenge of creating a new physical product for the college student market that would be likely to retail for less than USD 50. (This price cap is imposed to limit the complexity of the projects in a one-semester course.) Here is an example of a submitted idea: Convertible High-Heel Shoe Many prefer high-heel shoes for dress-up occasions, yet walking in high heels for more than short distances is very challenging. Might we create a stylish high-heel shoe that easily adapts to a comfortable walking configuration, say by folding down or removing a heel portion of the shoe? The set of 200 ideas forms the baseline for comparison with the ideas generated using LLMs. The average description is 63 words long, with a standard deviation of 34. We use OpenAI’s GPT-4 API access to prompt ChatGPT-4 with essentially the same prompt we gave the students. No LLM yet acts fully autonomously. Rather they are tools used by humans to complete tasks. Still, for the purpose of this study, we aim for minimal prompt engineering, thus representing a novice user scenario. We use the system prompt to provide contextual information and subsequent user prompts to ask for ideas, ten at a time. The user prompt includes the additional request that the descriptions are 40-80 words, similar to the student sample. System Prompt “You are a creative entrepreneur looking to generate new product ideas. The product will target college students in the United States. It should be a
Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation Page 2 Page 4