Long-running agents finally get better, not just bigger. Until this week, the dominant way to make a Claude agent stronger at a task was to extend its context, add more tools, or tune its prompt. Dreaming adds a fourth lever: the agent literally learns from its own runs without retraining the model. Multiagent is a first-class API surface, not a framework choice. CrewAI, LangGraph, AutoGen, and a dozen others have offered orchestration patterns for two years. Anthropic putting orchestration into

The other frontier labs have shipped pieces of this stack, but not the same combination. The table below compares what each major lab offers for long-running, multi-step, learning agents as of the May 10, 2026 launch. CapabilityAnthropic Claude Managed AgentsOpenAI Responses / Agents SDKGoogle Gemini Agent Engine Persistent memory across sessionsYes (managed)Yes (Threads + Memory)Yes (Memory Bank, GA) Offline learning between sessionsYes (dreaming, research preview)No native equivalentNo native

Anthropic Ships 'Dreaming' Agents and Sub-Agent Orchestration

Anthropic just gave Claude managed agents three upgrades that change how teams should think about long-running AI work: dreaming (research preview), outcomes (public beta), and multiagent orchestration (public beta). The headline result from the launch is Harvey, the legal AI company, reporting a roughly sixfold increase in task completion rates after enabling dreaming. The Claude managed agents dreaming feature is the first time a major frontier lab has shipped a memory-curation step that runs between sessions rather than inside one.

Want the complete, hands-on version of this guide?Get the Eguide →

What’s actually new

The May 7 release adds four capabilities to the Managed Agents API, gated behind the beta header managed-agents-2026-04-01. Dreaming is a scheduled process that periodically reviews an agent’s past sessions and persistent memory store, extracts behavioral patterns and learned workarounds, and rewrites the memory store so the next session starts from a smarter baseline. It is not a longer context window. It is not retrieval. It is a separate offline pass that decides what is worth remembering and how to file it.

Outcomes lets you score completed agent runs against a rubric you define and feed those scores back into dreaming and downstream training signals. Multiagent orchestration lets a lead agent spawn up to 20 parallel specialist sub-agents, route work to them, and synthesize their results. Webhooks complete the loop by letting external systems push events into a running managed agent without polling. The four features click together; teams that adopt them as a set get the full leverage.

The deployment customers Anthropic disclosed are concrete. Harvey runs long-form legal drafting and document creation across Claude managed agents and reported the 6x completion lift after dreaming was enabled. Spiral, the writing tool from Every, runs a lead agent on Claude Haiku that fields incoming requests, asks follow-up questions when needed, and delegates the actual drafting to Claude Opus sub-agents. The pattern is the new template: a fast cheap router on top, expensive specialists underneath, persistent learning across sessions.

Why it matters

Long-running agents finally get better, not just bigger. Until this week, the dominant way to make a Claude agent stronger at a task was to extend its context, add more tools, or tune its prompt. Dreaming adds a fourth lever: the agent literally learns from its own runs without retraining the model.
Multiagent is a first-class API surface, not a framework choice. CrewAI, LangGraph, AutoGen, and a dozen others have offered orchestration patterns for two years. Anthropic putting orchestration into the managed-agents runtime cuts out a layer of glue code and gives the lab visibility into where parallel agents actually help.
The Haiku-lead, Opus-worker pattern is now the cost-efficient default. Spiral’s architecture (cheap router + expensive specialists) is the right answer for almost every multiagent system, and Anthropic is explicitly designing for it.
Outcomes turn agents into measurable systems. Rubric scoring at the platform level lets product teams treat agents like services with SLOs, not like demos.
Harvey’s 6x is the number every legal, finance, and ops leader will quote in their next budget meeting. Vertical AI teams have been waiting for a number this clean. It is now in the wild.
Webhooks plus long-running sessions make Claude a real backend. An agent that wakes up on a Stripe event, runs for forty minutes, and emits a finished artifact is a different product category than a chat completion.

How to use it today

You need the beta header on every call and an Anthropic key with managed-agents enabled. The minimum viable upgrade for an existing managed-agents user is three steps: enable memory, enable dreaming, define one outcome rubric. The full multiagent pattern adds one more step.

Send the beta header on every request. Without it the new endpoints return 404.
Create a managed agent with memory + dreaming enabled. Dreaming is opt-in and runs nightly by default; you can change cadence.
Define an outcome rubric for the work the agent does. Use plain-English criteria; the platform will score completed sessions against them.
For multiagent, define your lead agent and worker agents separately and let the lead delegate. The lead does not need to be the same model as the workers.

Below is the minimum Python that creates a managed agent with dreaming, defines an outcome rubric, and starts a multiagent session with a Haiku lead and three Opus workers.

from anthropic import Anthropic

client = Anthropic(default_headers={
    "anthropic-beta": "managed-agents-2026-04-01",
})

agent = client.managed_agents.create(
    name="harvey-style-drafter",
    model="claude-opus-4-7",
    memory={"enabled": True, "scope": "agent"},
    dreaming={
        "enabled": True,
        "schedule": "0 3 * * *",
        "objective": (
            "Extract reusable patterns about legal drafting style, filetype "
            "quirks, and successful tool-use sequences. Discard one-off facts."
        ),
    },
    outcomes={
        "rubric": [
            {"name": "citations_correct", "weight": 0.4},
            {"name": "tone_matches_firm_house_style", "weight": 0.3},
            {"name": "no_hallucinated_clauses", "weight": 0.3},
        ],
    },
)

lead = client.managed_agents.create(
    name="lead-router",
    model="claude-haiku-4-5",
    tools=[{"type": "delegate", "targets": [agent.id]}],
)

session = client.managed_agents.sessions.create(
    agent_id=lead.id,
    input="Draft a motion to dismiss for the Delaware case bundle in /matter/4421.",
    max_subagents=3,
)

for event in client.managed_agents.sessions.stream(session.id):
    print(event.type, event.data)

A few details matter in production. Dreaming runs are billed; configure a schedule that matches how often your agent is actually used, not a default daily run for an agent that fires twice a week. Outcomes scores compound with dreaming: rubric scoring tells dreaming what was good and what was not, which changes what dreaming keeps. Start with three to five rubric criteria; more than seven becomes noisy. For multiagent, max_subagents is the cap on parallel workers per turn, not per session, and the platform throttles at twenty regardless of what you ask for.

How it compares

The other frontier labs have shipped pieces of this stack, but not the same combination. The table below compares what each major lab offers for long-running, multi-step, learning agents as of the May 10, 2026 launch.

Capability	Anthropic Claude Managed Agents	OpenAI Responses / Agents SDK	Google Gemini Agent Engine
Persistent memory across sessions	Yes (managed)	Yes (Threads + Memory)	Yes (Memory Bank, GA)
Offline learning between sessions	Yes (dreaming, research preview)	No native equivalent	No native equivalent
Native multiagent orchestration	Yes, up to 20 parallel sub-agents	Partial via Swarm patterns	Yes via ADK with custom routing
Platform-level outcome scoring	Yes (rubric-based, public beta)	Evals product, not in agent runtime	Vertex AI Eval, separate surface
Webhooks for long-running sessions	Yes (public beta)	Partial (Realtime, not for agents)	Pub/Sub via Agent Engine
Recommended lead/worker pattern	Haiku lead, Opus workers	4o-mini lead, GPT-5 workers	Flash lead, Pro workers
Pricing model	Per-token plus managed-agents session fee	Per-token plus tool calls	Per-token plus Agent Engine seat

The competitive read is clear. OpenAI has the better consumer agent product (ChatGPT agent mode), but its developer surface for long-running multi-session agents is still catching up. Google’s Vertex Agent Engine is mature on enterprise plumbing but does not have an equivalent to dreaming. Anthropic is leading on the operating model for agents that work overnight, learn from yesterday, and arrive smarter today.

What’s next

Three things to watch over the next sixty days. First, dreaming will leave research preview and likely move to public beta in late June or early July; pricing will get firmer at that point and per-agent dreaming budgets will become a real cost line. Second, expect Anthropic to publish more case studies; Harvey is the lead reference, but several legal, finance, and operations companies are running large-scale pilots. The next public numbers will probably come from finance or ops, where rubric scoring is easier to define than in creative work. Third, the Haiku-lead, Opus-worker pattern will likely become an explicit product configuration with templates, since virtually every team is converging on it independently.

The longer view is that dreaming is the first credible answer to a question agents have ducked for two years: how do you get better at the work without retraining the underlying model. If dreaming generalizes well across customer corpora, it is the architectural shift that turns frontier agents from impressive demos into compounding assets.

Frequently Asked Questions

Is dreaming the same thing as fine-tuning?

No. Fine-tuning updates the model weights; dreaming updates the agent’s memory store and the rules it uses to navigate that memory. The underlying Claude model is unchanged. Practically, this means dreaming compounds without retraining costs and without coordination with Anthropic’s model release schedule.

Can I use dreaming without multiagent orchestration?

Yes. Dreaming is a per-agent capability and works on single-agent sessions. Many teams will enable dreaming on their existing managed agents before they add multiagent orchestration. The features stack but do not require each other.

How does dreaming handle private or regulated data?

Dreaming runs inside Anthropic’s managed environment under the same data handling policies as the underlying API, and rewrites the agent’s own memory store. It does not export memory contents outside the customer’s account. For regulated workloads, configure the dreaming objective explicitly to exclude personally identifiable information and review the curated memory store periodically.

What does outcomes scoring actually do during a session?

Outcomes scores run after a session completes, against the rubric you defined when the agent was created. Scores feed into dreaming, into platform analytics, and into your own retention or routing logic. A low-score session is still completed and billed; the rubric does not gate user-visible output.

How is this priced?

Token usage is billed as normal. Managed-agents sessions carry a per-session fee. Dreaming runs incur token costs based on the volume of memory reviewed and the length of the dreaming objective. Multiagent orchestration is billed per sub-agent token consumption, not per parallel slot. Plan to model dreaming as roughly 5 to 15 percent of your monthly agent token cost at typical usage.

When will dreaming leave research preview?

Anthropic has not committed to a date, but the typical research-preview-to-public-beta cycle for managed-agents features has been six to twelve weeks. A reasonable assumption is late June or July 2026, with general availability later in the year.

Go deeper than this article

This article covers the essentials. Our premium eguide “How to Build Custom AI Agents with Anthropic’s Agent SDK” gives you the full step-by-step playbook — prompts, workflows, and copy-paste recipes you can put to work today.

Get “How to Build Custom AI Agents with Anthropic’s Agent SDK” →