AI Control: Improving Safety Despite Intentional Subversion

AI Safety Fundamentals: Alignment - A podcast by BlueDot Impact

Try Bookbeat 60! days for free, click here

Enjoy a whole world of audiobooks and e-books, everything from new releases to the classics

Categories:

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:We summarize the paper;We compare our methodology to the methodology of other safety papers.Source:https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversionNarrated for AI Safety Fundamentals by Perrin WalkerA podc...