AI Safety via Debate

AI Safety Fundamentals: Alignment - A podcast by BlueDot Impact

Try Bookbeat 60! days for free, click here

Enjoy a whole world of audiobooks and e-books, everything from new releases to the classics

Categories:

Abstract:To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take ...