Self-improvement just became operational. Production AI agents that get better over time without retraining or manual intervention is a capability shift. The 6x completion-rate improvement Harvey reported is the kind of multiplier that changes deployment economics dramatically. Observable self-improvement addresses the trust problem. Agents that learn opaquely produce trust erosion as operators wonder what the agent has internalized. Plain-text notes and structured playbooks that operators can r

The agent platform landscape in mid-2026 has tightened around a few major players with substantively different capability profiles. The table below summarizes the leaders against the dimensions that drive enterprise agent platform decisions. PlatformSelf-improvementOutcomes / measurementMulti-agentBest fit Anthropic Claude Managed AgentsDreaming (preview)Outcomes (beta)Multi-agent orchestration (beta)Enterprise high-stakes workflows OpenAI Agent SDK + SwarmLimited; manual prompt updatesCustom ev

Anthropic Dreaming Lets Claude Agents Self-Improve Overnight

Anthropic just announced Dreaming for Claude Managed Agents at the Code with Claude developer conference, a scheduled background process that reviews each agent’s past sessions, extracts patterns, and writes structured playbooks the agent’s future sessions reference automatically. The Anthropic Dreaming feature is the first production-grade self-improvement capability shipped for managed agents at scale — early adopter Harvey reported a 6x improvement in completion rates after agents accumulated learnings between sessions. Anthropic also moved Outcomes (structured success measurement) and multi-agent orchestration from research preview into public beta. Together the three features mark a substantive maturation of Anthropic’s agent platform: agents that remember, agents that measure their own performance, and agents that coordinate with other agents under bounded autonomy.

What’s actually new

Dreaming runs as a scheduled background process — not a real-time inference path — that reviews an agent’s recent session history, identifies patterns the agent encountered, and produces two artifacts: plain-text learning notes (informal observations the agent can reference) and structured playbooks (machine-readable patterns for specific situations). Future sessions retrieve the relevant learnings as context, which lets the agent skip mistakes it has already made and apply techniques that have already worked. The pattern is analogous to how human professionals consolidate experience into expertise — except it happens automatically, between sessions, without manual prompt engineering or fine-tuning.

The key design choice is observability. Both the plain-text notes and structured playbooks are human-readable, version-controlled, and auditable. Operators can review what their agent has learned, edit incorrect or undesirable patterns, and roll back learnings if needed. The transparency addresses one of the chronic concerns about self-improving AI — opaque internal state that drifts without operator awareness. Dreaming makes the drift visible and controllable.

Outcomes is the second major announcement. Each managed agent now has a structured outcome model — what counts as success, what counts as failure, what intermediate signals matter. The agent itself reports against this model after each session. Outcomes data feeds into Dreaming so the agent learns to optimize for actual success rather than surface metrics like task completion or response length.

Multi-agent orchestration — the third announcement — moves into public beta. Claude managed agents can now invoke other Claude managed agents through the standard tool-use interface, with the broader Anthropic platform handling identity, authorization, conversation context, and observability across the agent network. Combined with the Agent2Agent protocol‘s recent Linux Foundation governance milestone, multi-agent workflows that span vendors are increasingly tractable.

The Harvey adoption story validates the capability. Harvey is the legal AI startup whose agents handle research, contract analysis, and similar workflows for major law firms. The 6x completion-rate improvement after Dreaming adoption represents the kind of step-change improvement that justifies Anthropic’s investment in the feature. The improvement comes from agents remembering filetype workarounds (a deposition transcript needs different handling than a contract), tool-specific patterns (their internal eDiscovery system has quirks worth remembering), and matter-specific context that previously required re-explanation in every session.

Why it matters

Self-improvement just became operational. Production AI agents that get better over time without retraining or manual intervention is a capability shift. The 6x completion-rate improvement Harvey reported is the kind of multiplier that changes deployment economics dramatically.
Observable self-improvement addresses the trust problem. Agents that learn opaquely produce trust erosion as operators wonder what the agent has internalized. Plain-text notes and structured playbooks that operators can read, edit, and version make self-improvement transparent.
Outcomes as a first-class concept changes agent evaluation. Agents optimizing for “task completion” produce surface metrics; agents optimizing for measurable outcomes produce real impact. The Outcomes feature gives operators the framework to define what success means for each agent and measure it consistently.
Multi-agent orchestration is now production-ready in Anthropic’s platform. Combined with A2A’s Linux Foundation governance, the multi-agent infrastructure question is largely resolved for 2026 deployment. The operational patterns are documented; the infrastructure is mature.
The agent platform competition narrows. Anthropic’s Claude Managed Agents now has Dreaming, Outcomes, and multi-agent orchestration as differentiating features. OpenAI’s Agent SDK, Microsoft Copilot Studio, and Google’s Gemini Enterprise will need competitive responses through 2026 to maintain parity.
Domain-specific agents become more economically viable. The Dreaming-induced improvement is largest for agents handling complex, idiosyncratic workflows where session-to-session learning compounds. Legal, healthcare, financial, scientific research, and similar verticals are the natural early adopters.

How to use Anthropic Dreaming today

Dreaming is available as a research preview feature on Claude Console for organizations with Claude Managed Agents access. Outcomes and multi-agent orchestration are in public beta. Three steps put a development team on the new features.

Enable Dreaming on an existing managed agent. In the Claude Console, navigate to your agent’s configuration and toggle the Dreaming feature. Configure the dreaming schedule (typically nightly is appropriate for high-volume agents; weekly works for lower-volume ones). Set retention policies for the learnings notes and playbooks.
Define Outcomes for each agent. Articulate what success means for the agent in structured form — completion criteria, intermediate signals, failure modes. The Outcomes definition becomes part of the agent’s configuration and feeds into Dreaming for goal-aligned learning.
Compose multi-agent workflows. For workflows requiring multiple specialized agents, define each agent with its own scope, capabilities, and tools. Use the platform’s orchestration to compose them — the planner agent invokes specialist agents through tool calls, with results flowing back through the conversation context.

API integration follows the existing Claude Managed Agents pattern with new configuration fields:

MASK12

For multi-agent orchestration, the pattern composes agents through tool calls:

MASK13

How it compares

The agent platform landscape in mid-2026 has tightened around a few major players with substantively different capability profiles. The table below summarizes the leaders against the dimensions that drive enterprise agent platform decisions.

Platform	Self-improvement	Outcomes / measurement	Multi-agent	Best fit
Anthropic Claude Managed Agents	Dreaming (preview)	Outcomes (beta)	Multi-agent orchestration (beta)	Enterprise high-stakes workflows
OpenAI Agent SDK + Swarm	Limited; manual prompt updates	Custom evaluation harnesses	Swarm orchestration	OpenAI-stack deployments
Microsoft Copilot Studio + Wave 3	Limited; admin-managed	Built-in usage analytics	Multi-agent through M365 + Copilot	Microsoft-shop enterprises
Google Gemini Enterprise Agent Platform	Limited; vertex-managed	Vertex AI evaluation	A2A + native orchestration	Google Cloud / Workspace deployments
LangGraph (open source)	Custom; build it yourself	Custom evaluation	First-class state graphs	Custom multi-agent systems
CrewAI / AutoGen	Custom	Custom	Multi-agent collaboration	Research-heavy multi-agent work

Two takeaways. First, Anthropic just established the most differentiated agent platform on the dimensions that matter for production workflows — self-improvement, outcomes measurement, and orchestration as first-class concepts. Other platforms can replicate the capability but Anthropic shipped first at production scale. Second, the platform vs. framework distinction matters. Platform vendors (Anthropic, OpenAI, Microsoft, Google) provide managed services with bundled capability; frameworks (LangGraph, CrewAI, AutoGen) provide building blocks for custom development. The right choice depends on the organization’s engineering capacity and the customization required.

What’s next

Three things to watch over the next two quarters. First, OpenAI and Microsoft’s competitive responses. Self-improvement and outcomes-based agent measurement are the kind of capabilities that matter; expect both vendors to ship comparable features through Q3 2026. Second, the operational learnings from broader Dreaming adoption. The Harvey result is impressive but represents a single deployment. Broader adoption will produce more nuanced understanding of where Dreaming helps most, where it produces unexpected behavior, and how operators should manage the learning loop. Third, the integration with Outcomes-based evaluation across the broader AI evaluation ecosystem. RAGAS, Vellum, Braintrust, LangSmith, and other AI evaluation platforms will likely integrate Outcomes-style measurement, making goal-aligned evaluation more standardized industry-wide.

The longer-term implication is that agent platforms are evolving from “stateless function callers” into “managed services with memory, measurement, and learning.” The maturation makes agents economically viable for higher-stakes workflows that previously required custom-built solutions. The 2027-2028 enterprise agent adoption curve depends substantially on how well platform vendors execute on this maturation, with Anthropic now setting the bar.

Frequently Asked Questions

Is Dreaming generally available right now?

Dreaming is in research preview on Claude Console for organizations with Managed Agents access. Outcomes and multi-agent orchestration are in public beta with broader availability. General availability for all three features is expected through Q3 2026 based on Anthropic’s typical release cadence.

How is Dreaming different from fine-tuning the model?

Fine-tuning modifies the underlying model weights. Dreaming produces structured learnings (notes and playbooks) that are retrieved as context at session time. The pattern is faster (no training run required), more transparent (learnings are human-readable), and more controllable (operators can edit or remove specific learnings). For most production agent use cases, Dreaming-style learning is more practical than fine-tuning.

What happens to dreaming-derived learnings if I switch agents or models?

The learnings are tied to the specific agent configuration. Switching to a different agent configuration starts fresh; major model updates may require re-validation of accumulated learnings. Anthropic’s documentation includes guidance on managing learnings across agent versions and model updates.

Can multiple agents share dreaming learnings?

By default, each agent has its own learnings. Anthropic supports learning federation patterns where related agents (different specializations of the same domain) can share specific structured playbooks while maintaining individual context. The configuration controls which learnings are private and which are federated.

How does pricing work for Dreaming?

Dreaming runs background analysis sessions that consume model inference; pricing is based on the model usage in the analysis. Typical Dreaming costs run 5-15% of the agent’s session inference cost depending on volume and review depth. The cost is justified by the productivity improvements; net economics are favorable in most production use cases.

What about privacy when Dreaming reviews session content?

Dreaming operates on session content within the agent’s tenant. The same data-handling controls that apply to session content apply to dreaming-derived learnings. For sensitive use cases, configure retention policies, operator review requirements, and learning-scope restrictions to match your privacy requirements.