“DeepMind’s ’Frontier Safety Framework’ is weak and unambitious” by Zach Stein-Perlman

EA Forum Podcast (All audio) - A podcast by EA Forum Team

Try Bookbeat 60! days for free, click here

Enjoy a whole world of audiobooks and e-books, everything from new releases to the classics

Categories:

FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP. DeepMind's FSF has three steps: Create model evals for warning signs of "Critical Capability Levels" Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D" E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents" Do model evals every 6x effective compute and every 3 months of fine-tuning This is an "aim," not a commitment Nothing about evals during deployment "When a model reaches evaluation thresholds (i.e. passes a set of early warning evaluations), we [...] --- First published: May 18th, 2024 Source: https://forum.effectivealtruism.org/posts/LahLysfvsWGWAcNaz/deepmind-s-frontier-safety-framework-is-weak-and-unambitious --- Narrated by TYPE III AUDIO.