EA - Understanding the diffusion of large language models: summary by Ben Cottier

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Podcast artwork

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Understanding the diffusion of large language models: summary, published by Ben Cottier on December 21, 2022 on The Effective Altruism Forum.5-minute summaryTime from GPT-3’s publication (May 2020) to.TimeDateModelOrganizations involvedA better model is produced7 monthsDec 2020GopherDeepmindMay 2020OPT-175BMeta AI ResearchA better or equally good model is open-sourcedANDA successful, explicit attempt at replicating GPT-3 is completed23 months(equally good model)Table 1: Key facts about the diffusion of GPT-3-like modelsI present my findings from case studies on the diffusion of nine language models that are similar to OpenAI’s GPT-3 model, including GPT-3 itself. By “diffusion”, I mean the spread of artifacts among different actors, where artifacts include trained models, code, datasets, and algorithmic insights. Diffusion can occur through different mechanisms, including open publication and replication. Seven of the models in my case studies are “GPT-3-like” according to my definition, which basically means they are similar to GPT-3 in design and purpose, and have similar or better capabilities. Two models have clearly worse capabilities but were of interest for other reasons. (more)I think the most important effects of diffusion are effects on (1) AI timelines—the leading AI developer can get to TAI sooner by using knowledge shared by other developers, (2) who leads AI development, (3) by what margin they lead, and (4) how many actors will plausibly be contenders to develop transformative AI (TAI). The latter three effects in turn affect AI timelines and the competitiveness of AI development. Understanding cases of diffusion today may improve our ability to predict and manage the effects of diffusion in the lead up to TAI being developed. (more)See Table 1 for key facts about the timing of GPT-3-like model diffusion. Additionally, I’m 90% confident that no model exists which is (a) uncontroversially better than GPT-3 and (b) has its model weights immediately available for download by anyone on the internet (as at November 15, 2022). However, GLM-130B (Tsinghua University, 2022)—publicized in August 2022 and developed by Tsinghua University and the Chinese AI startup Zhipu.AI—comes very close to meeting these criteria: it is probably better than GPT-3, but still requires approval to download the weights. (more)I’m 85% confident that in the two years since the publication of GPT-3 (in May 2020), publicly known GPT-3-like models have only been developed by (a) companies whose focus areas include machine learning R&D and have more than $10M in financial capital, or (b) a collaboration between one of these companies and either academia, or a state entity, or both. That is, I’m 85% confident that there has been no publicly known GPT-3-like model developed solely by actors in academia, very small companies, independent groups, or state AI labs. (more)In contrast, I think that hundreds to thousands of people have enough resources and talent to use a GPT-3-like model through their own independent setup (rather than just an API provided by another actor). This is due to wider access to the model weights of GPT-3-like models such as OPT-175B and BLOOM since May 2022. (more)I estimate that the cost of doing the “largest viable deployment” with a GPT-3-like model would be 20% of the cost of developing the model (90% CI: 10 to 68%), in terms of the dollar cost of compute alone. This means that deployment is most likely much less prohibitive than development. For people aiming to limit/shape diffusion, this analysis lends support to targeting interventions at the development stage rather than the deployment stage. (more)Access to compute appears to have been the main factor hindering the development of GPT-3-like models. The next biggest hindering factor appears t...

Visit the podcast's native language site