Interpretability: Understand how the models of AI think
Interpretability: Understand how the models of AI think


https://www.youtube.com/watch?v=fgknuvivn
Author: Anthropic – Duration: 00:59:03
What happens inside an AI model as he thinks? Why are AI models sycophantics, and why do they haunt? Are the AI models simply “completes glorified automobiles”, or does something more complicated happen? How do we study these questions scientifically? Join Josh Batson d'Anthropic, Emmanuel Ameisen and Jack Lindsey as they discuss the latest research on the interpretability of AI. Learn more about the search for interpretation of anthropic: https://www.anthropic.com/news/tracing-thoughts-language-model
Sections: Introduction [00:00] The biology of AI models [01:37] Scientific methods to open the black box [6:43] Some surprising features in Claude's mind [10:35] Can we trust what a model claims he thinks? [20:39] Why do the models have hallucinated? [25:17] AI models planning in advance [34:15] Why interpretability is important [38:30] The future of interpretability [53:35]






