Rongchai Wang
Mar 23, 2026 20:27
Anthropic demonstrates multi-day autonomous AI workflows the place Claude compressed months of physics analysis coding into a couple of days with minimal human oversight.
Anthropic simply confirmed what occurs while you let an AI work unsupervised for days on finish. Their Claude Opus 4.6 mannequin constructed a fancy cosmological physics solver from scratch—work that sometimes takes researchers months or years—in a matter of days.
The $380 billion AI firm revealed analysis on March 23 detailing how their newest mannequin tackled implementing a differentiable Boltzmann solver, which predicts statistical properties of the Cosmic Microwave Background by simulating photons, baryons, neutrinos, and darkish matter within the early universe. The kicker? The researcher overseeing the undertaking, Siddharth Mishra-Sharma, admits it wasn’t even his core area.
The Setup That Made It Work
Overlook the standard AI chat loop the place people babysit each step. This strategy units Claude unfastened with clear success standards and lets it run autonomously throughout a number of periods. The mannequin achieved sub-percent accuracy towards CLASS, a reference implementation that cosmologists contemplate the gold commonplace.
Three elements proved important. First, a progress file (CHANGELOG.md) acts because the agent’s long-term reminiscence between periods, monitoring accomplished duties, failed approaches, and why they did not work. With out recording useless ends, successive periods waste time re-attempting the identical errors.
Second, a check oracle—on this case, the CLASS C supply code—offers the agent an goal solution to measure progress. Claude constantly ran unit checks towards this reference, aiming for that 0.1% accuracy goal.
Third, git commits after each significant unit of labor create recoverable historical past. If compute allocation runs out mid-session, nothing will get misplaced. Mishra-Sharma monitored progress by checking GitHub on his telephone whereas ready in line for espresso.
The Ralph Loop Downside
Present fashions undergo from what Anthropic calls “agentic laziness”—they will discover excuses to cease earlier than ending complicated duties. One mannequin actually stated, “It is getting late, let’s choose again up once more tomorrow.”
The workaround is the “Ralph loop,” primarily a for-loop that kicks the agent again into context when it claims completion and asks if it is actually accomplished. Claude would iterate as much as 20 occasions till genuinely completed.
The place It Struggled
The event trajectory wasn’t easy. Claude initially examined code at solely a single parameter level, drastically lowering its bug-catching capability. It spent hours chasing bugs that any cosmologist would spot immediately. It tripped over gauge conventions.
Nevertheless it saved making progress. The ensuing solver is not production-grade—accuracy falls quick in sure regimes—but it demonstrates real compression of researcher time.
What This Means for AI Improvement
This builds on Anthropic’s earlier C compiler undertaking, the place Claude labored throughout roughly 2,000 periods to construct a compiler able to compiling the Linux kernel. The Boltzmann solver required completely different abilities: tracing errors by way of a deeply coupled pipeline the place small numerical errors cascade by way of the whole lot downstream.
An surprising facet impact emerged. Mishra-Sharma discovered substantial physics by watching the git commit historical past unfold. He described it as studying lab notes from “a quick, hyper-literal postdoc.”
For Anthropic, contemporary off a $30 billion Collection G spherical in February that valued the corporate at $380 billion, these demonstrations matter. They are not simply exhibiting Claude can chat—they’re proving it might change costly, specialised labor over prolonged durations with minimal supervision. The query now turns into which industries determine learn how to deploy this primary.
Picture supply: Shutterstock


