Anthropic says that some Claude models can now end "harmful or offensive" conversations

Anthropic has New possibilities have been announced This will allow some of his latest, largest models at the end of conversations in what the company describes as “rare, extreme cases of persistently harmful or offensive user interactions.” Striking, Anthropic says that he does it not to guard the human user, but the AI model itself.

To make it clear, the company does not claim that its AI Claude models are feeling or can be damaged by their conversations with users. According to its own words, he stays “very uncertain as to the potential moral status of Claude and other LLM, now or in the future.”

- Advertisement -

However, his announcement indicates a recent program created to look at what he calls “model prosperity” and says that Antropic principally adopts a capability approach, “is working on identifying and implementing cheap interventions to reduce the risk of modeling prosperity, in the event of possibilities.”

This last change is currently limited to Claude Opus 4 and 4.1. And again, this is to occur only in “extreme cases”, corresponding to “users’ requests with sexual content with the participation of minors and attempts to obtain information that would enable large -scale violence or acts of terror.”

Although such conclusions may potentially cause legal or promoting problems for the anthropic himself (a recent reporting of how ChatGPT can potentially strengthen or contribute to users’ pondering), the company claims that when testing, Claude Opus 4 showed a “strong preference against” response to those demands and “pattern of visible anxiety.”

As for these recent possibilities ending at the interview, the company says: “In all cases, Claude is only to use its ability to end the conversation as a last resort, when many attempts to redirect failed and the hope for productive interaction has been exhausted or when the user clearly asks Claude to end the chat.”

Anthropic also claims that Claude “has been directed to not use this skill in cases where users may be exposed to the direct risk of the box or other boxes.”

TechCrunch event

San Francisco
|.
October 27-29 2025

When Claude ends the conversation, Anthropic claims that users will still have the option to begin recent conversations from the same account and create recent branches of a troublesome conversation by editing their answers.

“We treat this function as a continuous experiment and we will continue to improve our approach,” says the company.

Active US investors were busy cutting checks in October

From Air Force officer to director general of space defense: why even Rogers left to build weapons for orbit

Cluely’s Roy Lee suggests that viral hype isn’t enough

Replika founder raises $20 million in pre-release content for Wabi, the ‘YouTube app’

Tech makers are piling up huge bets on startups even as appetite for mergers and acquisitions wanes

How entrepreneurs recover from life events without burning out

5 tips to engage Generation Z in email marketing

The pressure to start is real: why 72% of founders have mental health issues

5 questions startups should ask before implementing AI

5 email delivery tips to help you increase sales

From asking to offering: the mindset shift every founder needs

4 Strategies to Become a Category Creator

One book every new business owner should read

Why perfectionism delays your startup and how to think about it

4 things I will do differently when I start my next company

Startup funding continued to decline in November, with the number of mega rounds reaching a three-year high

German AI image generator Black Forest Labs raises $300 million at a $3.25 billion valuation as European AI funding ramps up

Funding for Edtech-specific startups remains low

Bezos launches AI startup with reported $6.2 billion in funding

10 Biggest Funding Rounds This Week: Artificial Intelligence and Defense Technologies Are Taking the Lead

Anthropic says that some Claude models can now end “harmful or offensive” conversations

Latest Posts

Why AI coding agents aren’t production ready: fragile context windows, broken...

Tonight on StrictlyVC Palo Alto, the future of deep tech will...

“Truth serum” for artificial intelligence: a new OpenAI method for training...

This VC charges $0 for PR and has 12 unicorns to...

Active US investors were busy cutting checks in October

From Air Force officer to director general of space defense: why...

Cluely’s Roy Lee suggests that viral hype isn’t enough

Replika founder raises $20 million in pre-release content for Wabi, the...

Recomended

Why AI coding agents aren’t production ready: fragile context windows, broken refactors, lack of operational awareness

Tonight on StrictlyVC Palo Alto, the future of deep tech will be explained to you

“Truth serum” for artificial intelligence: a new OpenAI method for training models to confess errors

This VC charges $0 for PR and has 12 unicorns to show

Sources: Aaru, an artificial intelligence research startup, raises Series A value at a “principal” valuation of $1 billion

The 10 biggest financing rounds this week: Investors are back to writing big checks