Defend against Jailbreaks
Defend against Jailbreaks


https://www.youtube.com/watch?v=banxyqcfdyo
Author: Anthropic – Duration: 01:14:31
Anthropogenic researchers, Mrinank Sharma, Jerry Wei, Ethan Perez and Meg Tong discuss a system based on constitutional classifiers that protect models from jailbreaks. Find out more: https://www.anthropic.com/news/constitutional-classifier
0:00 Introduction 0:39 Definition of jailbreaks and their importance 3:35 Universal jailbreaks 10:24 The Swiss cheese model for security 11:25 Explain the constitutional classifiers 14:11 Ensure the protection of the model 17:30 Understanding constitutional and synthetic data 19:00 Flexibility of the constitutional approach 24:15 Origins of the constitutional demo approach: 32:24 Configuration 47:42 If the approach is sure in practice 54:05 The public demo: approaches people have tried to get around the classifiers 56:14 Advantages of the classifier approach for the users of Claude 1:00:18 Memorable moments of the project 1:08:20 Differences of approach between this project and other research 1:11 The evolution of research in AI security.






