Anthropic has a new security system it says can stop almost all AI jailbreaks
Anthropic unveils new proof-of-concept security measure tested on Claude 3.5 Sonnet“Constitutional classifiers” are an attempt to teach LLMs value systemsTests resulted in more than an 80% reduction in successful jailbreaksIn…