36 Data Collection and Initial Validation The construction of the dataset started with searching for the companies’ annual reports and websites for evidence of corporate venturing activities using sentences that include at least one keyword from the following dictionary list: entrepreneurs, small business, startup(s), startupper, new business, entrepreneur, entrepreneurship, entrepreneurial, SME, small and medium-sized enterprises, micro business, accelerators, spin-off, accelerator(s), incubator(s), corporate venture capital, and CVC. After removing duplicates and combining sentences from the same webpage into one excerpt, we ended up with 1,207 excerpts from the annual reports and 36,958 excerpts from the Web. Next, we drew a random sample of 200 excerpts to identify corporate venturing practices they represent. We read all the examples and identified 24 kinds of practices that we aggregated into eight categories of corporate venturing practices analyzed in the report and listed here in alphabetical order: Acceler- ators and Incubators, Business Services, Corporate Venture Capital (CVC), Events, Mentorship, Venture Building, Venturing Clienting, and Shared Workspace. The same excerpt might describe multiple practices and thus be associated with belonging to various categories. To apply the categories to all the 38,165 excerpts in our sample, we created a description for each cate- gory accompanied by a couple of examples from the records we have previously coded manually. We used these descriptions to prompt the large language model ChatGPT (4o) to review the sample and code each excerpt using the applicable categories. Specifically, ChatGPT assigned to each excerpt the probability of belonging to each of the eight categories of corporate venturing practices. For statistical analysis, we aggregated the excerpts by company and took the maximum probability for the excerpts in each category to indicate the company’s likelihood of adopting the corresponding venturing practice. We used an aggregate probability higher than 0.5 to demonstrate the company’s adoption of the corresponding practice. To assess the accuracy of ChatGPT’s coding, we drew a random sample of 30 companies for each corporate venturing practice. We verified the evidence in our database by accessing the corresponding web pages and other online sources. The results of this exercise are summarized in the table below: Practice ChatGPT Prediction Accuracy Obtained (%) Corporate Venture Capital 86.7 Accelerators & Incubators 96.7 Venture Building 93.3 Venture Clienting 93.3 Mentorship 90.0 Events 93.3 Business Services 86.8 Shared Workspace 83.3 The data was categorized by industry, region, and country and can be further analyzed through the report’s website*. The following page lists summary statistics for study categories by continent and industry. In most of the report, categories with sample sizes of fewer than five entries were excluded to mitigate the risk of misrepresenting the reality of corporate venturing within specific industries, regions, or countries. The methodology and the final report are not without limitations. Because we use solely public disclosures, we do not attempt to present an objective evaluation of the effectiveness of individual practices or their combinations. Nor do we verify whether companies fully implemented these practices as communicated. 36
When Goliath Needs David: Redefining Corporate Venturing Page 35 Page 37