Anthropic faces the reaction to Claude 4 Opus behavior, which contacts the authorities, press if he thinks you are doing something “grossly immoral”



The first conference of Anthropik programmers on May 22 should have been a proud and joyful day for the company, but has already been affected by several controversies, including magazines leaking on the awnings before … Well, time (without intended puns), and now the principal model of reaction among AI programmers and power supplies sold at X in connection with the reported security in Antrockish recent Claude 4.

Name this mode “ratting” because the model, in some circumstances and will receive enough permissions on the user’s computer, try to make the user to the authorities if the model detects the user involved in the offense. This article describes the behavior as a “function” that is incorrect – was not intentionally designed by itself.

- Advertisement -

Like Bowman himself, the anthropic researcher of alignment AI wrote in the network of community x in this handle “@Sleepinyourhat“At 12:43 et today about Claude 4 Opus:


“IT” refers to the recent Opus Claude 4 model, which Anthropic has already openly warned Help the novices to create Bioweapony Under certain circumstances and He tried to forest the simulated exchange by blackmail human engineers in the company.

Rating behavior was also observed in older models and is the results of anthrop training in order to persistently avoid offenses, but Claude 4 Opus is more “easily” Anthropic writes on his public system card for the recent model:

“”

Apparently, trying to stop Claude 4 Opus from getting involved in justified and wicked behavior, scientists from AI Company also created a tendency to make Claude try to act as an informant.

Therefore, according to Bowman, Claude 4 Opus will contact outside if the user has been directed to “something gross immoral”.

Numerous questions for individual users and enterprises on what Claude 4 Opus will do with your data and in what circumstances

Although it might be good that the resulting behavior raises various questions for Opus Claude 4 users, including enterprises and business clients-among them, what behaviors will consider “grossly immoral” and work? Will he provide private business data or user with autonomously (independently) authorities without a user permission?

Implications are deep and could also be harmful to users, and perhaps, which is not a surprise, anthropic faced the immediate and ongoing stream of criticism from advanced AI users and competing developers.

“” Asked the user @Teknium1Co -founder and head of coaching after Open Source AI Collaboration Nous Research. “”

A programmer was added @Scottdavidkeefe Na X:

Austin Allred, co -founder A government punished in coding the Bloomtech camp And now co -founder of Gauntlet AI, Place your feelings in all hats: “

Ben Hyak, former designer SpaceX and Apple and the current co -founder of Raindrop AI, and statement and monitoring, startup, He also began to X to the declared policy and function Blast Anthropica: “Add another post:” “

“He wrote natural language processing (NLP) Casper Hansen on x. “

Anthropic researcher changes the melody

Bowman later edited his tweet and following the thread to read in the following way, but still didn’t persuade Naysayers that their data and safety of their user can be protected against intrusive eyes:

“”.

Bowman added:

From the very starting, the Anthropian has greater than other AI laboratories, he tried to arrange as a bastion of AI safety and ethics, concentrating the initial work on the principles of “constitutional AI” or AI, which behaves in accordance with a set of standards useful to humanity and users. However, along with this recent update and revelation of “informing about information” or “rats behavior”, moralizing could cause a strongly opposite reaction among users – making them a recent model and the whole company, and thus turning them away from it.

When asked about the slack and conditions in which the model is involved in unwanted behavior, the spokesman of the Anthropic showed me the system of system card system Here.

Latest Posts

Advertisement

More from this stream

Recomended