“A framework for thinking about AI power-seeking” by Joe_Carlsmith

EA Forum Podcast (All audio) - A podcast by EA Forum Team

This post lays out a framework I’m currently using for thinking about when AI systems will seek power in problematic ways. I think this framework adds useful structure to the too-often-left-amorphous “instrumental convergence thesis,” and that it helps us recast the classic argument for existential risk from misaligned AI in a revealing way. In particular, I suggest, this recasting highlights how much classic analyses of AI risk load on the assumption that the AIs in question are powerful enough to take over the world very easily, via a wide variety of paths. If we relax this assumption, I suggest, the strategic trade-offs that an AI faces, in choosing whether or not to engage in some form of problematic power-seeking, become substantially more complex. Prerequisites for rational takeover-seekingFor simplicity, I’ll focus here on the most extreme type of problematic AI power-seeking – namely, an AI or set of [...] ---Outline:(00:50) Prerequisites for rational takeover-seeking(02:48) Agential prerequisites(06:40) Goal-content prerequisites(09:10) Takeover-favoring incentives(13:29) Recasting the classic argument for AI risk using this framework(26:04) What if the AI can’t take over so easily, or via so many different paths?The original text contained 16 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: July 24th, 2024 Source: https://forum.effectivealtruism.org/posts/dZ2WvJierisi8jzFi/a-framework-for-thinking-about-ai-power-seeking --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.