Mistral Medium 3.5 Lands: 128B Dense Model With Work Mode Agents

Q: Why Mistral Medium 3.5 matters for AI buyers in 2026

Unified-model economics. Running one 128B model for chat, reasoning, and code is materially cheaper than running three specialized models. For enterprises with mixed workloads, the consolidation reduces both inference cost and operational complexity. Open weights for the largest Mistral flagship to date. Customers who need to self-host for compliance, data-sovereignty, or air-gapped reasons now have a frontier-class option that doesn't require API calls to a US-hosted provider. European enterpri

Q: How to use Mistral Medium 3.5 today

Try it in Le Chat. Sign in to chat.mistral.ai. The default model is now Mistral Medium 3.5. Toggle Work Mode in the prompt area to access the agentic mode. Hit the API. The Mistral API exposes Medium 3.5 under the model ID mistral-medium-3.5-latest: curl https://api.mistral.ai/v1/chat/completions \ -H "Authorization: Bearer $MISTRAL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "mistral-medium-3.5-latest", "messages": [ {"role": "user", "content": "Write a 200-word summary of s

Q: How Mistral Medium 3.5 compares to the 2026 frontier

The competitive picture in May 2026: ModelParamsContextSWE-BenchOpen Weights?Hosted Price (input) Mistral Medium 3.5128B dense256K77.6%Yes (modified MIT)~$0.40/M tokens Claude Opus 4.7Not disclosed200K (1M variant)~82%No$15/M tokens Claude Sonnet 4.6Not disclosed200K~74%No$3/M tokens GPT-5.5 InstantNot disclosed~256KNot disclosedNo~$5/M tokens Gemini 2.5 ProNot disclosed1M~70%No~$3/M tokens Llama 4 Maverick~400B MoE10M~62%YesSelf-host only The Mistral Medium 3.5 sweet spot is clear: open weights

Mistral AI shipped Medium 3.5 in early May 2026 — a 128-billion-parameter dense model with a 256K context window that handles chat, reasoning, coding, and vision in a single set of weights. Alongside the model launch came Le Chat Work Mode, an agentic mode that runs multi-step tasks against Mistral Medium 3.5 with parallel tool calling. Remote agents in Vibe, Mistral’s developer coding agent, now run on the same underlying model. The release is the European AI champion’s clearest play yet to compete with Anthropic, OpenAI, and Google on the agentic surface most enterprises actually deploy.

Want the complete, hands-on version of this guide?Browse the Eguides →

What’s actually new about Mistral Medium 3.5

The headline number is the unified architecture. Mistral Medium 3.5 is one 128B dense model that handles what previous Mistral generations split across separate Chat, Code, and Reasoning variants. The reasoning effort is configurable per request — the same model answers a one-line chat in milliseconds and works through a multi-step agentic run when the request requires it. The 256K-token context window is competitive with Claude Opus 4.7’s standard window and OpenAI’s GPT-5.5 family. The benchmark results put Mistral Medium 3.5 at 77.6% on SWE-Bench Verified, in striking distance of frontier coding scores, and 91.4 on τ³-Telecom for agentic capability.

Mistral released the model with open weights under a modified MIT license, making Medium 3.5 the largest open-weight flagship from a major AI lab in 2026. Hugging Face hosts the weights at mistralai/Mistral-Medium-3.5-128B. Self-hosting is realistic on a single 8xH100 or 8xH200 node with appropriate quantization, and the open license permits commercial use with attribution.

Le Chat Work Mode is the consumer-facing piece of the release. Where the standard Le Chat answers questions, Work Mode treats the user’s request as a goal — it plans the multi-step approach, calls tools in parallel (web search, calculator, document reader, browser automation), and reports back when the work is complete. The pattern matches what Anthropic does with Claude’s agentic features and what OpenAI ships in ChatGPT’s Agent Mode, with Mistral’s distinctive emphasis on European data residency and self-hosting options.

The Vibe integration is the developer story. Vibe is Mistral’s coding agent platform — the rough equivalent of Cursor, Replit Agent, or Devin. Remote agents in Vibe now run as long-lived cloud sessions that can work on a codebase asynchronously, returning results when the job completes. The pattern lets engineers kick off a long-running refactor or feature build and get back to it later rather than babysitting a synchronous coding session.

Why Mistral Medium 3.5 matters for AI buyers in 2026

Unified-model economics. Running one 128B model for chat, reasoning, and code is materially cheaper than running three specialized models. For enterprises with mixed workloads, the consolidation reduces both inference cost and operational complexity.
Open weights for the largest Mistral flagship to date. Customers who need to self-host for compliance, data-sovereignty, or air-gapped reasons now have a frontier-class option that doesn’t require API calls to a US-hosted provider. European enterprises subject to GDPR-adjacent residency requirements care about this.
The 256K context window. Sufficient for most long-document workloads — full codebases, large legal contracts, multi-document RAG with chunky retrieval — without paying the cost-and-latency premium of 1M-context variants.
Real agentic capability rather than chat-with-tools. The 91.4 τ³-Telecom score and the 77.6 SWE-Bench Verified score put Medium 3.5 in the conversation for production agentic deployments, not just demos.
European competitor momentum. Mistral remains the only frontier-class AI lab headquartered in Europe. The Medium 3.5 release strengthens the case that AI buyers don’t have to depend exclusively on US-headquartered providers.
Configurable reasoning effort. Per-request reasoning configuration is a 2026 trend (OpenAI’s GPT-5.5 Instant versus GPT-5.5 Reasoning, Anthropic’s extended thinking) and Mistral now matches the pattern. The same model handles a quick prompt and a complex agentic run depending on request configuration.

How to use Mistral Medium 3.5 today

Try it in Le Chat. Sign in to chat.mistral.ai. The default model is now Mistral Medium 3.5. Toggle Work Mode in the prompt area to access the agentic mode.

Hit the API. The Mistral API exposes Medium 3.5 under the model ID mistral-medium-3.5-latest:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-medium-3.5-latest",
    "messages": [
      {"role": "user", "content": "Write a 200-word summary of supervised learning"}
    ]
  }'

Use the Python SDK. The official Mistral Python client supports Medium 3.5 directly:

from mistralai import Mistral

client = Mistral(api_key="...")
resp = client.chat.complete(
    model="mistral-medium-3.5-latest",
    messages=[{"role": "user", "content": "Explain backpropagation in 3 sentences."}],
    reasoning_effort="low",  # or "medium" or "high"
)
print(resp.choices[0].message.content)

Configure reasoning effort per request. For routine queries, leave reasoning_effort at low for fast responses. For complex multi-step tasks, raise to high — the model spends more time reasoning before producing the final answer:

resp = client.chat.complete(
    model="mistral-medium-3.5-latest",
    messages=[{
        "role": "user",
        "content": "Plan a 90-day migration from MongoDB to Postgres for a 2TB dataset."
    }],
    reasoning_effort="high",
)

Self-host via Hugging Face. Download the weights and run on your own GPUs:

# Login to Hugging Face
huggingface-cli login

# Download the model (requires significant disk)
huggingface-cli download mistralai/Mistral-Medium-3.5-128B \
  --local-dir ./mistral-medium-3-5

# Serve via vLLM
vllm serve ./mistral-medium-3-5 \
  --tensor-parallel-size 8 \
  --max-model-len 262144

Try Vibe remote agents for coding. Sign in to Vibe at vibe.mistral.ai, point it at your repo, and submit a long-running task. The remote agent runs the work in the cloud and notifies when complete.

Use tool calling. Medium 3.5 supports the standard OpenAI-compatible tool-calling format Mistral has used since Mistral Large 2:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

resp = client.chat.complete(
    model="mistral-medium-3.5-latest",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
)

How Mistral Medium 3.5 compares to the 2026 frontier

The competitive picture in May 2026:

Model	Params	Context	SWE-Bench	Open Weights?	Hosted Price (input)
Mistral Medium 3.5	128B dense	256K	77.6%	Yes (modified MIT)	~$0.40/M tokens
Claude Opus 4.7	Not disclosed	200K (1M variant)	~82%	No	$15/M tokens
Claude Sonnet 4.6	Not disclosed	200K	~74%	No	$3/M tokens
GPT-5.5 Instant	Not disclosed	~256K	Not disclosed	No	~$5/M tokens
Gemini 2.5 Pro	Not disclosed	1M	~70%	No	~$3/M tokens
Llama 4 Maverick	~400B MoE	10M	~62%	Yes	Self-host only

The Mistral Medium 3.5 sweet spot is clear: open weights, frontier-adjacent coding capability, competitive context window, and pricing materially lower than the closed-frontier providers. The trade-off is benchmark gap — Claude Opus and GPT-5.5 still edge ahead on the hardest tasks. For teams whose workload sits in the band where Medium 3.5 is competitive, the cost difference is substantial.

The agentic comparison is more nuanced. Le Chat Work Mode and Vibe remote agents are functional but younger than Claude’s agentic features and OpenAI’s ChatGPT Agent Mode. The capability is real; the polish and ecosystem integration are still catching up. Mistral’s bet is that the open-weight foundation plus the agentic UI delivers enough value to win European enterprises and self-hosting customers regardless of the polish gap.

What’s next for Mistral

Three things to watch over the next 90 days. First, the agent ecosystem around Medium 3.5. Mistral has shipped the model and the Le Chat/Vibe surfaces; the question is whether third-party tooling (CrewAI, LangGraph, Autogen integrations, MCP servers tuned for Mistral) builds out fast enough to make Medium 3.5 a credible choice for production agentic deployments outside Mistral’s own products.

Second, the European enterprise adoption story. Mistral’s pitch to European customers — frontier capability without US-headquartered data exposure — gets stronger with every quarter that Anthropic and OpenAI deepen their US-cloud entanglement. The Medium 3.5 release should accelerate adoption among European banks, telcos, public-sector buyers, and regulated industries that have been waiting for open-weight options at this capability level.

Third, the next Mistral model. Mistral’s release cadence in 2024-2025 was roughly six months between major model generations. If that holds, Mistral Large 3 (the speculated successor to the Large family) lands sometime in Q4 2026, likely with materially higher parameter count and benchmark performance. The competitive question is whether Mistral can keep pace with the once-quarterly major-release cadence Anthropic and OpenAI have settled into.

For buyers evaluating the AI stack in 2026, the practical move is to add Mistral Medium 3.5 to the production-model rotation rather than treating it as an experimental option. Run the same evaluation prompts against Medium 3.5, Claude Sonnet, and GPT-5.5; measure quality, latency, and cost; deploy the model that wins for your specific workload. The multi-model pattern produces better outcomes than betting on a single provider in any case, and Medium 3.5 is now competitive enough to earn a seat at the table.

Frequently Asked Questions

Is Mistral Medium 3.5 actually open-source?

The weights are released under a modified MIT license that permits commercial use with attribution. The training data and training code are not released, so the model isn’t open-source in the strict OSI sense. The license is permissive enough for most commercial deployments, including self-hosted production use.

Can I fine-tune Mistral Medium 3.5?

Yes, via the standard Hugging Face workflow with LoRA, QLoRA, or full fine-tuning. Full fine-tuning of a 128B dense model requires substantial GPU resources; most teams use LoRA-based approaches that fine-tune small adapters on top of the frozen base weights. Mistral also offers managed fine-tuning via their API.

What’s the difference between Work Mode and the regular Le Chat?

Regular Le Chat answers conversational queries. Work Mode treats the request as a multi-step goal, plans the approach, calls tools (search, calculator, document reader) in parallel where possible, and produces a structured final result. The mode is appropriate for research tasks, multi-source analysis, and any work that requires combining information from multiple places.

How does Mistral Medium 3.5 handle vision?

The model accepts images as inputs and reasons about them in the same conversation. The vision capability is part of the unified model — no separate vision endpoint required. Image inputs work through the same chat completions API with image content blocks in the message array.

Should European enterprises switch from Claude or GPT to Mistral?

Switch the workloads where Medium 3.5 is competitive on quality and the cost or data-residency advantages matter. Keep Claude or GPT for the workloads where benchmark gaps matter for output quality. The multi-model pattern — Mistral for many tasks, Claude or GPT for the hardest tasks — typically produces better total outcomes than full-fleet migration in either direction.

When will the next Mistral model arrive?

Mistral hasn’t published a public roadmap. Based on historical cadence, a Mistral Large 3 release in Q3 or Q4 2026 is plausible. Expect benchmark improvements, longer context windows, and continued momentum on agentic capability across each release.

Go deeper than this article

This article covers the essentials. Our Technical & Coding eguide collection gives you the full step-by-step playbooks — prompts, workflows, and copy-paste recipes built for exactly this work.

Browse Technical & Coding Eguides →