jailbreaks

Anthropic has a new security system it says can stop almost all AI jailbreaks

Post author:Dom
Post published:February 4, 2025
Post category:Pro Tech Reviews
Post comments:0 Comments

Anthropic unveils new proof-of-concept security measure tested on Claude 3.5 Sonnet“Constitutional classifiers” are an attempt to teach LLMs value systemsTests resulted in more than an 80% reduction in successful jailbreaksIn…