Anthropic Unveils RSP Version 3 with Major AI Safety Overhaul

Tony Kim
Feb 24, 2026 20:48

Anthropic releases third model of Accountable Scaling Coverage, separating firm commitments from industry-wide suggestions after 2.5 years of testing.

Anthropic has launched the third iteration of its Accountable Scaling Coverage, marking a major restructuring of how the AI firm approaches catastrophic danger mitigation after two and a half years of real-world implementation.

The replace, revealed February 24, 2026, introduces three main modifications: a transparent separation between what Anthropic can obtain alone versus what requires industry-wide motion, a brand new Frontier Security Roadmap with public accountability metrics, and necessary exterior overview of Threat Studies below sure circumstances.

What Truly Modified

Essentially the most notable shift? Anthropic is now overtly admitting that some security measures merely can’t be carried out by a single firm. The earlier RSP’s higher-tier safeguards (ASL-4 and past) had been left deliberately obscure—seems that wasn’t simply warning, it was as a result of attaining them unilaterally could also be unattainable.

A RAND report cited by Anthropic states that “SL5” safety requirements aimed toward stopping top-tier cyber threats are “at present not attainable” and “will possible require help from the nationwide safety neighborhood.”

Moderately than water down these necessities to make compliance straightforward, Anthropic selected to restructure solely. The brand new RSP now explicitly maps out two tracks: commitments the corporate will meet no matter exterior elements, and proposals it believes all the AI {industry} must undertake.

The Sincere Evaluation

Anthropic’s autopsy on RSP variations 1 and a pair of is refreshingly candid. What labored: the coverage pressured inner groups to deal with security as a launch requirement, and opponents like OpenAI and Google DeepMind adopted related frameworks inside months. ASL-3 safeguards had been efficiently activated in Might 2025.

What did not work: functionality thresholds proved much more ambiguous than anticipated. Organic danger evaluation supplies a telling instance—fashions now move most fast checks, making it laborious to argue dangers are low, however outcomes aren’t definitive sufficient to show dangers are excessive both. By the point wet-lab trials full, extra highly effective fashions have already shipped.

The political setting hasn’t helped. Federal safety-oriented discussions have stalled as coverage focus shifted towards AI competitiveness and financial development.

New Accountability Mechanisms

The Frontier Security Roadmap introduces particular, publicly-graded targets together with “moonshot R&D” tasks for info safety, automated red-teaming techniques that exceed present bug bounty contributions, and complete data of all essential AI improvement actions—analyzed by AI for insider threats.

Threat Studies will publish each 3-6 months, explaining how capabilities, menace fashions, and mitigations match collectively. Exterior reviewers with “unredacted or minimally-redacted entry” will publicly critique Anthropic’s reasoning.

The corporate is already operating pilots regardless of present fashions not but triggering the exterior overview requirement.

Trade Implications

This restructuring arrives as AI governance frameworks face growing scrutiny. California’s SB 53, New York’s RAISE Act, and the EU AI Act’s Codes of Apply have all begun requiring frontier builders to publish catastrophic danger frameworks—necessities Anthropic addresses by means of its current Frontier Compliance Framework.

Whether or not opponents observe Anthropic’s lead on separating unilateral commitments from {industry} suggestions stays to be seen. The strategy primarily acknowledges that voluntary self-regulation has limits, whereas positioning the corporate to advocate for coordinated authorities motion with out showing to demand guidelines it will probably’t observe itself.

For the broader AI sector, Anthropic’s clear acknowledgment of what single firms can’t obtain alone could show extra influential than the technical coverage particulars themselves.

Picture supply: Shutterstock

What's Hot

Stablecoin Supply Rises To $315B As Institutional Flows Lift USDC

Solana bulls defend $70, can SOL price recover as a falling wedge breakout nears?

Is Your Crypto Funding Pyonyang? Inside Solana-Based Drift Protocol $286 Million Exploit

Anthropic Unveils RSP Version 3 with Major AI Safety Overhaul

Stablecoin Supply Rises To $315B As Institutional Flows Lift USDC

The Bitcoin Treasury Model With A Built-In Valuation Floor

Linea Activates EIP-7702 Smart Wallet Upgrades Without Address Migration

Bitcoin takes bigger treasury role in Cardano Foundation assets

Stablecoin Supply Rises To $315B As Institutional Flows Lift USDC

Solana bulls defend $70, can SOL price recover as a falling wedge breakout nears?

Is Your Crypto Funding Pyonyang? Inside Solana-Based Drift Protocol $286 Million Exploit

The Bitcoin Treasury Model With A Built-In Valuation Floor

Linea Activates EIP-7702 Smart Wallet Upgrades Without Address Migration

What's Hot

Anthropic Unveils RSP Version 3 with Major AI Safety Overhaul

What Truly Modified

The Sincere Evaluation

New Accountability Mechanisms

Trade Implications

Related Posts