Anthropic's Claude Opus 4.6 Completes Months of Scientific Coding in Days

Rongchai Wang
Mar 23, 2026 20:27

Anthropic demonstrates multi-day autonomous AI workflows the place Claude compressed months of physics analysis coding into a couple of days with minimal human oversight.

Anthropic simply confirmed what occurs while you let an AI work unsupervised for days on finish. Their Claude Opus 4.6 mannequin constructed a fancy cosmological physics solver from scratch—work that sometimes takes researchers months or years—in a matter of days.

The $380 billion AI firm revealed analysis on March 23 detailing how their newest mannequin tackled implementing a differentiable Boltzmann solver, which predicts statistical properties of the Cosmic Microwave Background by simulating photons, baryons, neutrinos, and darkish matter within the early universe. The kicker? The researcher overseeing the undertaking, Siddharth Mishra-Sharma, admits it wasn’t even his core area.

The Setup That Made It Work

Overlook the standard AI chat loop the place people babysit each step. This strategy units Claude unfastened with clear success standards and lets it run autonomously throughout a number of periods. The mannequin achieved sub-percent accuracy towards CLASS, a reference implementation that cosmologists contemplate the gold commonplace.

Three elements proved important. First, a progress file (CHANGELOG.md) acts because the agent’s long-term reminiscence between periods, monitoring accomplished duties, failed approaches, and why they did not work. With out recording useless ends, successive periods waste time re-attempting the identical errors.

Second, a check oracle—on this case, the CLASS C supply code—offers the agent an goal solution to measure progress. Claude constantly ran unit checks towards this reference, aiming for that 0.1% accuracy goal.

Third, git commits after each significant unit of labor create recoverable historical past. If compute allocation runs out mid-session, nothing will get misplaced. Mishra-Sharma monitored progress by checking GitHub on his telephone whereas ready in line for espresso.

The Ralph Loop Downside

Present fashions undergo from what Anthropic calls “agentic laziness”—they will discover excuses to cease earlier than ending complicated duties. One mannequin actually stated, “It is getting late, let’s choose again up once more tomorrow.”

The workaround is the “Ralph loop,” primarily a for-loop that kicks the agent again into context when it claims completion and asks if it is actually accomplished. Claude would iterate as much as 20 occasions till genuinely completed.

The place It Struggled

The event trajectory wasn’t easy. Claude initially examined code at solely a single parameter level, drastically lowering its bug-catching capability. It spent hours chasing bugs that any cosmologist would spot immediately. It tripped over gauge conventions.

Nevertheless it saved making progress. The ensuing solver is not production-grade—accuracy falls quick in sure regimes—but it demonstrates real compression of researcher time.

What This Means for AI Improvement

This builds on Anthropic’s earlier C compiler undertaking, the place Claude labored throughout roughly 2,000 periods to construct a compiler able to compiling the Linux kernel. The Boltzmann solver required completely different abilities: tracing errors by way of a deeply coupled pipeline the place small numerical errors cascade by way of the whole lot downstream.

An surprising facet impact emerged. Mishra-Sharma discovered substantial physics by watching the git commit historical past unfold. He described it as studying lab notes from “a quick, hyper-literal postdoc.”

For Anthropic, contemporary off a $30 billion Collection G spherical in February that valued the corporate at $380 billion, these demonstrations matter. They are not simply exhibiting Claude can chat—they’re proving it might change costly, specialised labor over prolonged durations with minimal supervision. The query now turns into which industries determine learn how to deploy this primary.

Picture supply: Shutterstock

What's Hot

MoonPay Launches Open-Source Wallet Standard For AI Agents

Analyst Advises XRP Investors To Get Ready To Sell – Here’s Why

Anthropic’s Claude Opus 4.6 Completes Months of Scientific Coding in Days

Anthropic’s Claude Opus 4.6 Completes Months of Scientific Coding in Days

MoonPay Launches Open-Source Wallet Standard For AI Agents

Gold is not acting like a safe haven, so what does “digital gold” even mean for Bitcoin?

SEC Sends Proposed Crypto Interpretation to White House for Review

OpenAI Deploys GPT-5.4 to Monitor AI Agents for Misalignment Risks

MoonPay Launches Open-Source Wallet Standard For AI Agents

Analyst Advises XRP Investors To Get Ready To Sell – Here’s Why

Anthropic’s Claude Opus 4.6 Completes Months of Scientific Coding in Days

Russell 2000 snaps back 2% as risk-on bid spills into altcoins

Gold is not acting like a safe haven, so what does “digital gold” even mean for Bitcoin?

What's Hot

Anthropic’s Claude Opus 4.6 Completes Months of Scientific Coding in Days

The Setup That Made It Work

The Ralph Loop Downside

The place It Struggled

What This Means for AI Improvement

Related Posts