Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It
AI Safety Fundamentals: Alignment - A podcast by BlueDot Impact
Categories:
Our goal here is to popularize obscure and hard-to-understand areas of AI alignment.So let’s try to understand the incomprehensible meme! Our main source will be Hubinger et al 2019, Risks From Learned Optimization In Advanced Machine Learning Systems.Mesa- is a Greek prefix which means the opposite of meta-. To “go meta” is to go one level up; to “go mesa” is to go one level down (nobody has ever actually used this expression, sorry). So a mesa-optimizer is an optimizer one level down f...