AI Safety Fundamentals: Alignment

A podcast by BlueDot Impact

Try Bookbeat 60! days for free, click here

Enjoy a whole world of audiobooks and e-books, everything from new releases to the classics

83 Episodes

Is Power-Seeking AI an Existential Risk?
Published: 13/05/2023
Where I Agree and Disagree with Eliezer
Published: 13/05/2023
Supervising Strong Learners by Amplifying Weak Experts
Published: 13/05/2023
Measuring Progress on Scalable Oversight for Large Language Models
Published: 13/05/2023
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Published: 13/05/2023
Summarizing Books With Human Feedback
Published: 13/05/2023
Takeaways From Our Robust Injury Classifier Project [Redwood Research]
Published: 13/05/2023
AI Safety via Debatered Teaming Language Models With Language Models
Published: 13/05/2023
High-Stakes Alignment via Adversarial Training [Redwood Research Report]
Published: 13/05/2023
AI Safety via Debate
Published: 13/05/2023
Robust Feature-Level Adversaries Are Interpretability Tools
Published: 13/05/2023
Introduction to Logical Decision Theory for Computer Scientists
Published: 13/05/2023
Debate Update: Obfuscated Arguments Problem
Published: 13/05/2023
Discovering Latent Knowledge in Language Models Without Supervision
Published: 13/05/2023
Feature Visualization
Published: 13/05/2023
Toy Models of Superposition
Published: 13/05/2023
Understanding Intermediate Layers Using Linear Classifier Probes
Published: 13/05/2023
Acquisition of Chess Knowledge in Alphazero
Published: 13/05/2023
Careers in Alignment
Published: 13/05/2023
Embedded Agents
Published: 13/05/2023

4 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment