
Anthropic just gave Claude managed agents three upgrades that change how teams should think about long-running AI work: dreaming (research preview), outcomes (public beta), and multiagent orchestration (public beta). The headline result from the launch is Harvey, the legal AI company, reporting a roughly sixfold increase in task completion rates after enabling dreaming. The Claude managed agents dreaming feature is the first time a major frontier lab has shipped a memory-curation step that runs between sessions rather than inside one.
What’s actually new
The May 7 release adds four capabilities to the Managed Agents API, gated behind the beta header managed-agents-2026-04-01. Dreaming is a scheduled process that periodically reviews an agent’s past sessions and persistent memory store, extracts behavioral patterns and learned workarounds, and rewrites the memory store so the next session starts from a smarter baseline. It is not a longer context window. It is not retrieval. It is a separate offline pass that decides what is worth remembering and how to file it.
Outcomes lets you score completed agent runs against a rubric you define and feed those scores back into dreaming and downstream training signals. Multiagent orchestration lets a lead agent spawn up to 20 parallel specialist sub-agents, route work to them, and synthesize their results. Webhooks complete the loop by letting external systems push events into a running managed agent without polling. The four features click together; teams that adopt them as a set get the full leverage.
The deployment customers Anthropic disclosed are concrete. Harvey runs long-form legal drafting and document creation across Claude managed agents and reported the 6x completion lift after dreaming was enabled. Spiral, the writing tool from Every, runs a lead agent on Claude Haiku that fields incoming requests, asks follow-up questions when needed, and delegates the actual drafting to Claude Opus sub-agents. The pattern is the new template: a fast cheap router on top, expensive specialists underneath, persistent learning across sessions.
Why it matters
- Long-running agents finally get better, not just bigger. Until this week, the dominant way to make a Claude agent stronger at a task was to extend its context, add more tools, or tune its prompt. Dreaming adds a fourth lever: the agent literally learns from its own runs without retraining the model.
- Multiagent is a first-class API surface, not a framework choice. CrewAI, LangGraph, AutoGen, and a dozen others have offered orchestration patterns for two years. Anthropic putting orchestration into the managed-agents runtime cuts out a layer of glue code and gives the lab visibility into where parallel agents actually help.
- The Haiku-lead, Opus-worker pattern is now the cost-efficient default. Spiral’s architecture (cheap router + expensive specialists) is the right answer for almost every multiagent system, and Anthropic is explicitly designing for it.
- Outcomes turn agents into measurable systems. Rubric scoring at the platform level lets product teams treat agents like services with SLOs, not like demos.
- Harvey’s 6x is the number every legal, finance, and ops leader will quote in their next budget meeting. Vertical AI teams have been waiting for a number this clean. It is now in the wild.
- Webhooks plus long-running sessions make Claude a real backend. An agent that wakes up on a Stripe event, runs for forty minutes, and emits a finished artifact is a different product category than a chat completion.
How to use it today
You need the beta header on every call and an Anthropic key with managed-agents enabled. The minimum viable upgrade for an existing managed-agents user is three steps: enable memory, enable dreaming, define one outcome rubric. The full multiagent pattern adds one more step.
- Send the beta header on every request. Without it the new endpoints return 404.
- Create a managed agent with memory + dreaming enabled. Dreaming is opt-in and runs nightly by default; you can change cadence.
- Define an outcome rubric for the work the agent does. Use plain-English criteria; the platform will score completed sessions against them.
- For multiagent, define your lead agent and worker agents separately and let the lead delegate. The lead does not need to be the same model as the workers.
Below is the minimum Python that creates a managed agent with dreaming, defines an outcome rubric, and starts a multiagent session with a Haiku lead and three Opus workers.
from anthropic import Anthropic
client = Anthropic(default_headers={
"anthropic-beta": "managed-agents-2026-04-01",
})
agent = client.managed_agents.create(
name="harvey-style-drafter",
model="claude-opus-4-7",
memory={"enabled": True, "scope": "agent"},
dreaming={
"enabled": True,
"schedule": "0 3 * * *",
"objective": (
"Extract reusable patterns about legal drafting style, filetype "
"quirks, and successful tool-use sequences. Discard one-off facts."
),
},
outcomes={
"rubric": [
{"name": "citations_correct", "weight": 0.4},
{"name": "tone_matches_firm_house_style", "weight": 0.3},
{"name": "no_hallucinated_clauses", "weight": 0.3},
],
},
)
lead = client.managed_agents.create(
name="lead-router",
model="claude-haiku-4-5",
tools=[{"type": "delegate", "targets": [agent.id]}],
)
session = client.managed_agents.sessions.create(
agent_id=lead.id,
input="Draft a motion to dismiss for the Delaware case bundle in /matter/4421.",
max_subagents=3,
)
for event in client.managed_agents.sessions.stream(session.id):
print(event.type, event.data)
A few details matter in production. Dreaming runs are billed; configure a schedule that matches how often your agent is actually used, not a default daily run for an agent that fires twice a week. Outcomes scores compound with dreaming: rubric scoring tells dreaming what was good and what was not, which changes what dreaming keeps. Start with three to five rubric criteria; more than seven becomes noisy. For multiagent, max_subagents is the cap on parallel workers per turn, not per session, and the platform throttles at twenty regardless of what you ask for.
How it compares
The other frontier labs have shipped pieces of this stack, but not the same combination. The table below compares what each major lab offers for long-running, multi-step, learning agents as of the May 10, 2026 launch.
| Capability | Anthropic Claude Managed Agents | OpenAI Responses / Agents SDK | Google Gemini Agent Engine |
|---|---|---|---|
| Persistent memory across sessions | Yes (managed) | Yes (Threads + Memory) | Yes (Memory Bank, GA) |
| Offline learning between sessions | Yes (dreaming, research preview) | No native equivalent | No native equivalent |
| Native multiagent orchestration | Yes, up to 20 parallel sub-agents | Partial via Swarm patterns | Yes via ADK with custom routing |
| Platform-level outcome scoring | Yes (rubric-based, public beta) | Evals product, not in agent runtime | Vertex AI Eval, separate surface |
| Webhooks for long-running sessions | Yes (public beta) | Partial (Realtime, not for agents) | Pub/Sub via Agent Engine |
| Recommended lead/worker pattern | Haiku lead, Opus workers | 4o-mini lead, GPT-5 workers | Flash lead, Pro workers |
| Pricing model | Per-token plus managed-agents session fee | Per-token plus tool calls | Per-token plus Agent Engine seat |
The competitive read is clear. OpenAI has the better consumer agent product (ChatGPT agent mode), but its developer surface for long-running multi-session agents is still catching up. Google’s Vertex Agent Engine is mature on enterprise plumbing but does not have an equivalent to dreaming. Anthropic is leading on the operating model for agents that work overnight, learn from yesterday, and arrive smarter today.
What’s next
Three things to watch over the next sixty days. First, dreaming will leave research preview and likely move to public beta in late June or early July; pricing will get firmer at that point and per-agent dreaming budgets will become a real cost line. Second, expect Anthropic to publish more case studies; Harvey is the lead reference, but several legal, finance, and operations companies are running large-scale pilots. The next public numbers will probably come from finance or ops, where rubric scoring is easier to define than in creative work. Third, the Haiku-lead, Opus-worker pattern will likely become an explicit product configuration with templates, since virtually every team is converging on it independently.
The longer view is that dreaming is the first credible answer to a question agents have ducked for two years: how do you get better at the work without retraining the underlying model. If dreaming generalizes well across customer corpora, it is the architectural shift that turns frontier agents from impressive demos into compounding assets.
Frequently Asked Questions
Is dreaming the same thing as fine-tuning?
No. Fine-tuning updates the model weights; dreaming updates the agent’s memory store and the rules it uses to navigate that memory. The underlying Claude model is unchanged. Practically, this means dreaming compounds without retraining costs and without coordination with Anthropic’s model release schedule.
Can I use dreaming without multiagent orchestration?
Yes. Dreaming is a per-agent capability and works on single-agent sessions. Many teams will enable dreaming on their existing managed agents before they add multiagent orchestration. The features stack but do not require each other.
How does dreaming handle private or regulated data?
Dreaming runs inside Anthropic’s managed environment under the same data handling policies as the underlying API, and rewrites the agent’s own memory store. It does not export memory contents outside the customer’s account. For regulated workloads, configure the dreaming objective explicitly to exclude personally identifiable information and review the curated memory store periodically.
What does outcomes scoring actually do during a session?
Outcomes scores run after a session completes, against the rubric you defined when the agent was created. Scores feed into dreaming, into platform analytics, and into your own retention or routing logic. A low-score session is still completed and billed; the rubric does not gate user-visible output.
How is this priced?
Token usage is billed as normal. Managed-agents sessions carry a per-session fee. Dreaming runs incur token costs based on the volume of memory reviewed and the length of the dreaming objective. Multiagent orchestration is billed per sub-agent token consumption, not per parallel slot. Plan to model dreaming as roughly 5 to 15 percent of your monthly agent token cost at typical usage.
When will dreaming leave research preview?
Anthropic has not committed to a date, but the typical research-preview-to-public-beta cycle for managed-agents features has been six to twelve weeks. A reasonable assumption is late June or July 2026, with general availability later in the year.