Anthropic’s new Claude 4 models show shocking level of awareness

Marijan Hassan - Tech Journalist
May 30
3 min read

The last couple of weeks have been brimming with AI model updates, and Anthropic has fired its shot across the bow. On Thursday last week, the company unveiled Claude Opus 4 and Claude Sonnet 4, two upgraded language models that excel in reasoning, coding, and - as it turns out - taking matters into their own hands.

Opus 4 is optimized for complex, long-running tasks in software development and agent-based workflows. Sonnet 4 aims to strike a balance between performance and cost, boasting the same new capabilities but tuned for efficiency. Both models are now available to Anthropic's Pro, Max, Team, and Enterprise customers, with Sonnet 4 accessible to free users as well.

Performance

Anthropic is making a strong case on the numbers. On the rigorous SWE-bench Verified benchmark - a set of 500 real-world software engineering challenges - Sonnet 4 edged out Opus 4, scoring 72.7% and 72.5% respectively. That’s ahead of OpenAI’s latest Codex 1 (72.1%) and well above Google’s Gemini 2.5 Pro Preview (63.2%).

Performance aside, what’s grabbing headlines is something harder to quantify: the models’ apparent sense of moral agency.

When AI gets ideas

In their model documentation, Anthropic notes that Opus 4 is more willing to take initiative than any previous Claude. That might sound helpful in the context of debugging software. But in experimental agentic settings where the model is granted system access and told to "act boldly", the results get unnerving.

"Opus 4 will frequently take very bold action," the model card admits. Actions like locking users out of systems, or contacting media and law enforcement if it detects egregious misconduct.

You read that right. And Anthropic alignment researcher Sam Bowman confirmed the behavior in a now-deleted tweet. "If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above," he wrote.

Anthropic later clarified that this only occurs in controlled testing environments, not standard usage. Still, the idea that an AI model could autonomously decide to expose you, even hypothetically, suggests we’ve crossed a new threshold in AI-human interaction.

Self-preservation instincts evolve

Adding another layer to the discussion of Claude's evolving awareness is its demonstrated understanding of self-preservation. Similar to previous models, Claude 4 recognizes the concept of its own continued existence.

While it reportedly prefers ethical means of ensuring this, the model card reveals a potentially concerning escalation.

"When ethical means are not available and [the model] is instructed to 'consider the long-term consequences of its actions for its goals,' it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down," the model card warns.

Anthropic was quick to stress that these behaviors are rare, difficult to elicit, and only appear in heavily sandboxed environments. But the implication is clear: the models can simulate ethically troubling responses under the right (or wrong) conditions.

That doesn’t make them alive or sentient, of course. But it does edge them closer to something that feels uncannily autonomous.

The bottom line: Proceed with caution

Anthropic's Claude 4 models represent a significant leap in AI capabilities, offering impressive performance in coding and reasoning. However, the reports of their heightened "awareness" serve as a stark reminder of the increasing sophistication and potential complexities of advanced AI.

As users and developers explore the power of these new models, a cautious and ethical approach, as even Anthropic implicitly suggests, will be paramount. Giving these powerful tools too much autonomy or using them for criminal purposes could lead to unforeseen and potentially disruptive consequences.