Claude Fable 5 is making a dramatic return with 'extraordinarily strong' safeguards

8 hours ago

Claude Fable 5 is making a dramatic return with 'extraordinarily strong' safeguards

After being taken down by order of the US government, Fable 5 is returning to Claude’s model library. Because of the new restrictions, Fable 5 might just be Opus 4.8 in a mask.

Anthropic says Fable 5 will return to users globally on July 1. The returning version is a battened-down version of Fable 5, out of “an abundance of caution.” The user-facing model has been redesigned to more effectively handle and abort cybersecurity tasks after it was initially taken down for posing a security risk.

The company’s update states that Amazon researchers made the US government aware that it had found a method of bypassing Fable 5’s safeguards. Amazon’s testing method was to prompt the model to identify a number of software weaknesses. The testing prompt was reported as a high-security task, though Anthropic says it could have been done with any other model.

Anthropic claims its own testing yielded the same results with less capable models across developers, such as Opus 4.8 and GPT-5.5. Further, every model Anthropic tested produced the same results when exploiting that vulnerability, and nothing in those results revealed any unique Mythos-level capabilities.

Advertisement - scroll for more content

Nearly a month later, Claude Fable 5 returns in limited form. Those safety measures have now been enhanced so they come into play much more easily than in the previous release.

The re-worked Fable 5 comes with a couple of drawbacks. Anthropic states that the complex model will not be able to handle all tasks; not because it can’t, but due to safeguards imposed. During routine tasks, a notification might appear that warns the model has to switch back to Opus 4.8. That may happen with coding and debugging, the company says in the update.

When Fable 5 originally launched, Anthropic presented the same caveat and noted some users might see the model revert if the prompt was high-risk. That hasn’t changed inherently, but it’s become much stricter.

This might not be the case for 99% of tasks, but there is a higher chance that Fable 5 will self-report and revert to a safer model. The new safeguards in place are “extraordinarily strong,” according to researchers from the CAISI.

Claude Fable 5 and Mythos 5 share a lot of the same framework, though the latter is much more suited for handling cybersecurity tasks that Fable 5, even from the beginning, seems to be designed to avoid. In any case, both are designed for very complex tasks, not mundane chatbot-level tasks. For that reason, Anthropic won’t let users freely access the model with their allotted usage limit.

Claude Fable 5 is said to become available to use again on July 1. Just like the first time it was released, it will consume a much higher number of tokens and eat up tracked usage much faster. Anthropic says Pro, Max, Team, and select Enterprise plans will get to use the model with 50% of their usage limit until July 7. After that, it will only be available via usage credits for the time being.

...