How difficult is AI alignment? | Anthropogenic Research Fair
How difficult is AI alignment? | Anthropogenic Research Fair


Author: Anthropic – Duration: 00:28:06
At an event hosted by the Anthropic Research Salon in San Francisco, four of our researchers (Alex Tamkin, Jan Leike, Amanda Askell, and Josh Batson) discussed the science of alignment, interpretability, and the future of AI research. Further reading: Anthropic Research: https://anthropic.com/research
The character of Claude: https://www.anthropic.com/news/claude-character
Evaluation of functionality management: https://www.anthropic.com/research/evaluating-feature-steering
0:00 Introduction 0:30 An overview of alignment 4:48 The challenges of scaling 8:08 The role of interpretability 12:02 How models can help 14:31 Signs of whether the alignment is easy or difficult 18:28 Questions and answers — Multi-agent deliberation 20:38 Questions and answers – Epiphenomenon of model alignment 23: 43 Questions and Answers – What Alignment Resolution Could Look Like






