Anthropic has New possibilities have been announced This will allow some of his latest, largest models at the end of conversations in what the company describes as “rare, extreme cases of persistently harmful or offensive user interactions.” Striking, Anthropic says that he does it not to guard the human user, but the AI model itself.
To make it clear, the company does not claim that its AI Claude models are feeling or can be damaged by their conversations with users. According to its own words, he stays “very uncertain as to the potential moral status of Claude and other LLM, now or in the future.”
However, his announcement indicates a recent program created to look at what he calls “model prosperity” and says that Antropic principally adopts a capability approach, “is working on identifying and implementing cheap interventions to reduce the risk of modeling prosperity, in the event of possibilities.”
This last change is currently limited to Claude Opus 4 and 4.1. And again, this is to occur only in “extreme cases”, corresponding to “users’ requests with sexual content with the participation of minors and attempts to obtain information that would enable large -scale violence or acts of terror.”
Although such conclusions may potentially cause legal or promoting problems for the anthropic himself (a recent reporting of how ChatGPT can potentially strengthen or contribute to users’ pondering), the company claims that when testing, Claude Opus 4 showed a “strong preference against” response to those demands and “pattern of visible anxiety.”
As for these recent possibilities ending at the interview, the company says: “In all cases, Claude is only to use its ability to end the conversation as a last resort, when many attempts to redirect failed and the hope for productive interaction has been exhausted or when the user clearly asks Claude to end the chat.”
Anthropic also claims that Claude “has been directed to not use this skill in cases where users may be exposed to the direct risk of the box or other boxes.”
TechCrunch event
San Francisco
|.
October 27-29 2025
When Claude ends the conversation, Anthropic claims that users will still have the option to begin recent conversations from the same account and create recent branches of a troublesome conversation by editing their answers.
“We treat this function as a continuous experiment and we will continue to improve our approach,” says the company.
