ML Systems Will Have Weird Failure Modes

AI Safety Fundamentals: Alignment - A podcast by BlueDot Impact

Try Bookbeat 60! days for free, click here

Enjoy a whole world of audiobooks and e-books, everything from new releases to the classics

Categories:

Previously, I've argued that future ML systems might exhibit unfamiliar, emergent capabilities, and that thought experiments provide one approach towards predicting these capabilities and their consequences. In this post I’ll describe a particular thought experiment in detail. We’ll see that taking thought experiments seriously often surfaces future risks that seem "weird" and alien from the point of view of current systems. I’ll also describe how I tend to engage with these thought experimen...