Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It

AI Safety Fundamentals: Alignment - A podcast by BlueDot Impact

Our goal here is to popularize obscure and hard-to-understand areas of AI alignment.So let’s try to understand the incomprehensible meme! Our main source will be Hubinger et al 2019, Risks From Learned Optimization In Advanced Machine Learning Systems.Mesa- is a Greek prefix which means the opposite of meta-. To “go meta” is to go one level up; to “go mesa” is to go one level down (nobody has ever actually used this expression, sorry). So a mesa-optimizer is an optimizer one level down f...