AI Safety Fundamentals: Alignment

A podcast by BlueDot Impact

Try Bookbeat 60! days for free, click here

Enjoy a whole world of audiobooks and e-books, everything from new releases to the classics

83 Episodes

Constitutional AI Harmlessness from AI Feedback
Published: 19/07/2024
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Published: 19/07/2024
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Published: 19/07/2024
Chinchilla’s Wild Implications
Published: 17/06/2024
Deep Double Descent
Published: 17/06/2024
Intro to Brain-Like-AGI Safety
Published: 17/06/2024
Eliciting Latent Knowledge
Published: 17/06/2024
Toy Models of Superposition
Published: 17/06/2024
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Published: 17/06/2024
Discovering Latent Knowledge in Language Models Without Supervision
Published: 17/06/2024
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation
Published: 17/06/2024
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions
Published: 17/06/2024
Imitative Generalisation (AKA ‘Learning the Prior’)
Published: 17/06/2024
An Investigation of Model-Free Planning
Published: 17/06/2024
Low-Stakes Alignment
Published: 17/06/2024
Gradient Hacking: Definitions and Examples
Published: 17/06/2024
Empirical Findings Generalize Surprisingly Far
Published: 17/06/2024
Compute Trends Across Three Eras of Machine Learning
Published: 13/06/2024
Worst-Case Thinking in AI Alignment
Published: 29/05/2024
Public by Default: How We Manage Information Visibility at Get on Board
Published: 12/05/2024

1 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment